JP2010186307A

JP2010186307A - Moving image content identification apparatus and moving image content identification method

Info

Publication number: JP2010186307A
Application number: JP2009029790A
Authority: JP
Inventors: Yusuke Uchida; 祐介内田; Masayuki Hashimoto; 真幸橋本; Akio Yoneyama; 暁夫米山
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-02-12
Filing date: 2009-02-12
Publication date: 2010-08-26

Abstract

<P>PROBLEM TO BE SOLVED: To enable identification of moving image content even if a part on a temporal axis of reference moving image content is cut out or a telop or a logo is inserted into the reference moving image content, and to rapidly perform processing thereof. <P>SOLUTION: A moving image content identification apparatus includes: a moving image content input unit 11 for inputting moving image content; a feature amount extraction unit 12 for selecting a key frame from a plurality frames constituting the input moving image content to extract a feature amount of the selected key frame; and a database search unit 14 for, if identification moving image content has been input from the moving image content input unit 11 and the feature amount extraction unit 12 has extracted a feature amount of the key frame included in the identification moving image content, searching for a feature amount of a key frame of reference moving image content stored in a database 13 for each extracted feature amount. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、ハードディスクドライブやその他のメディア、ネットワークストレージ等に
保存されている動画コンテンツが、特定の動画コンテンツの一部を含むか否かを判定する
動画コンテンツ識別装置および動画コンテンツ識別方法に関する。 The present invention relates to a moving image content identification apparatus and a moving image content identification method for determining whether moving image content stored in a hard disk drive, other media, network storage, or the like includes a part of specific moving image content.

近年のブロードバンドの普及、およびHDD(Hard Disk Drive）、DVD（Digital Versatile Disk）、Blu-ray disc等のストレージの大容量化に伴って、デジタルコンテンツを著作権者やコンテンツプロバイダの許諾を得ずに、ネットワークを介して共有・公開することが容易になってきており、このような不正な共有・公開が問題となっている。このような問題に対して、デジタルコンテンツの指紋（特徴量）を利用して、複数のデジタルコンテンツの中から、著作権者が自由配布を許諾していない特定のコンテンツを自動的に検出する技術が提案されている。 With the spread of broadband in recent years and the increase in storage capacity of HDDs (Hard Disk Drives), DVDs (Digital Versatile Disks), Blu-ray discs, etc., digital content has not been approved by copyright holders or content providers. In addition, sharing and disclosing via a network has become easier, and such illegal sharing and disclosing has become a problem. Technology to automatically detect specific contents that the copyright holder has not permitted free distribution from among a plurality of digital contents by using fingerprints (features) of the digital contents for such problems. Has been proposed.

特許文献１では、三次元周波数解析と主成分分析を用いて、コンテンツの特徴量を記述している。この手法では、空間周波数解析（DCT）で得られた係数に時間軸方向への周波数解析（FFT）を加えた三次元周波数解析を行ない、さらに主成分分析により三次元周波数解析で得られた係数から特徴量を抽出している。特許文献２では、特許文献１で利用されている特徴量を用いて、流通コンテンツと類似している特定コンテンツを絞り込み、絞り込めない場合には、位相限定相関法を用いて流通コンテンツと最も類似している特定コンテンツを決定し、閾値によって同一コンテンツであるか否かを判定している。 In Patent Document 1, the feature amount of content is described using three-dimensional frequency analysis and principal component analysis. This method performs 3D frequency analysis by adding frequency analysis (FFT) in the time axis direction to the coefficient obtained by spatial frequency analysis (DCT), and then the coefficient obtained by 3D frequency analysis by principal component analysis. Feature values are extracted from. In Patent Document 2, the specific content similar to the distributed content is narrowed down using the feature amount used in Patent Document 1, and when it cannot be narrowed down, the most similar to the distributed content using the phase-only correlation method Specific content is determined, and it is determined whether or not the same content is based on a threshold value.

非特許文献１では、映像の各フレーム全体からカラーレイアウトと呼ばれる特徴量を抽出し、複数のフレームをシーケンシャルにマッチングさせることで、映像の一部分が切り取られる等の時間的編集が行われた場合でも検出を可能にしている。 In Non-Patent Document 1, even when temporal editing such as part of the video is cut out by extracting a feature amount called color layout from the entire frame of the video and sequentially matching a plurality of frames. Detection is possible.

また、非特許文献２では、映像の各フレームからコーナーと呼ばれる特徴点を検出し、その周辺から特徴量を抽出し、各特徴点をマッチングさせることによって、切り取り等の編集が行なわれた場合であっても、不正流通コンテンツを検出できるようにしている。 Further, in Non-Patent Document 2, a feature point called a corner is detected from each frame of an image, a feature amount is extracted from the periphery thereof, and each feature point is matched to perform editing such as clipping. Even if there is, illegally distributed content can be detected.

特開２００５−１８６７５号公報JP 2005-18675 A 特開２００６−２８５９０７号公報JP 2006-285907 A

E. Kasutani and A. Yamada, “The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval,” in Proc. of ICIP, 2001, pp. 674-677.E. Kasutani and A. Yamada, “The MPEG-7 color layout descriptor: a compact image feature description for high-speed image / video segment retrieval,” in Proc. Of ICIP, 2001, pp. 674-677. J. Law-To et al., “Video Copy Detection: A Comparative Study,”in Proc. ACM CIVR’07, pp. 371-378, 2007.J. Law-To et al., “Video Copy Detection: A Comparative Study,” in Proc. ACM CIVR’07, pp. 371-378, 2007.

しかしながら、特許文献１および２で開示されている手法では、動画コンテンツ１つから１つの特徴量を抽出するため、例えば、動画コンテンツを２つに分割する等の時間軸方向の編集が行なわれると検出ができなくなるという問題がある。非特許文献１で開示されている手法では、画面全体から１つの特徴量のみを抽出しているため、テロップやロゴを挿入するような空間的編集が行われると検出ができなくなる問題がある。また、非特許文献２で開示されている手法では、１画面から数十個の特徴点を抽出し、それら全てをマッチングさせているため、特徴点の抽出およびマッチングに時間がかかりすぎるという問題がある。 However, in the methods disclosed in Patent Documents 1 and 2, since one feature amount is extracted from one moving image content, for example, when editing in the time axis direction such as dividing the moving image content into two is performed. There is a problem that detection is impossible. In the method disclosed in Non-Patent Document 1, since only one feature amount is extracted from the entire screen, there is a problem that detection cannot be performed if spatial editing such as inserting a telop or a logo is performed. Further, in the method disclosed in Non-Patent Document 2, dozens of feature points are extracted from one screen and all of them are matched, so that there is a problem that it takes too much time to extract and match the feature points. is there.

本発明は、このような事情に鑑みてなされたものであり、参照動画コンテンツの時間軸上の一部分を切り出したり、参照動画コンテンツにテロップやロゴが挿入されたりしても、動画コンテンツの識別を可能とし、その処理を高速に行なうことができる動画コンテンツ識別装置および動画コンテンツ識別方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and even if a part of the reference video content on the time axis is cut out or a telop or logo is inserted into the reference video content, the video content is identified. An object of the present invention is to provide a moving image content identification device and a moving image content identification method capable of performing the processing at high speed.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明の動画コンテンツ識別装置は、識別対象である識別動画コンテンツが、識別基準である参照動画コンテンツの全部または一部を含むかどうかを判定する動画コンテンツ識別装置であって、前記動画コンテンツを入力する動画コンテンツ入力部と、前記入力された動画コンテンツを構成する複数のフレームからキーフレームを選択し、前記選択したキーフレームの特徴量を抽出する特徴量抽出部と、前記動画コンテンツ入力部から前記識別動画コンテンツが入力され、前記特徴量抽出部が前記識別動画コンテンツに含まれる前記キーフレームの特徴量を抽出した場合、前記抽出された特徴量毎に、データベースに格納されている前記参照動画コンテンツのキーフレームの特徴量を検索するデータベース検索部と、を備え、前記検索の結果、前記識別動画コンテンツの特徴量に対応する前記参照動画コンテンツの特徴量の個数に基づいて、前記識別動画コンテンツが前記参照動画コンテンツの全部または一部を含むかどうかの判定を行なうことを特徴としている。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the moving image content identification device of the present invention is a moving image content identification device that determines whether or not the identified moving image content that is the identification target includes all or part of the reference moving image content that is the identification criterion. A video content input unit that selects a key frame from a plurality of frames constituting the input video content, and extracts a feature amount of the selected key frame; and the video content input unit When the identified moving image content is input from and the feature amount extraction unit extracts the feature amount of the key frame included in the identified moving image content, the reference stored in the database for each extracted feature amount A database search unit that searches for keyframe feature quantities of video content, and As a result of the search, based on the number of feature quantities of the reference video content corresponding to the feature quantity of the identified video content, it is determined whether or not the identified video content includes all or part of the reference video content. It is characterized by.

このように、識別動画コンテンツのキーフレームの特徴量を抽出し、抽出された特徴量毎に、データベースに格納されている参照動画コンテンツのキーフレームの特徴量を検索し、識別動画コンテンツの特徴量に対応する参照動画コンテンツの特徴量の個数に基づいて、識別動画コンテンツが参照動画コンテンツの全部または一部を含むかどうかの判定を行なうので、識別動画コンテンツの全部または一部が、参照動画コンテンツの時間軸上の一部分を切り出したり、参照動画コンテンツにテロップやロゴが挿入されたりした場合であっても、動画コンテンツの識別が可能となり、検索時間の短縮および検索の精度を高めることが可能となる。 In this way, the feature amount of the key frame of the identified moving image content is extracted, the feature amount of the key frame of the reference moving image content stored in the database is searched for each extracted feature amount, and the feature amount of the identified moving image content It is determined whether or not the identified video content includes all or part of the reference video content based on the number of feature quantities of the reference video content corresponding to the reference video content. Video content can be identified even when a part of the time axis is cut out or a telop or logo is inserted in the reference video content, which can reduce search time and improve search accuracy. Become.

（２）また、本発明の動画コンテンツ識別装置において、前記特徴量抽出部は、前記動画コンテンツ入力部から前記識別動画コンテンツが入力された場合、前記選択した識別動画コンテンツのキーフレームのタイムコードを取得し、前記データベース検索部は、前記識別動画コンテンツの特徴量におけるタイムコードと、前記識別動画コンテンツの特徴量に対応する前記参照動画コンテンツの特徴量におけるタイムコードとの差分値を、前記識別動画コンテンツの特徴量毎に算出し、同じ値の前記差分値の個数が、所定の閾値以上であった場合、前記識別動画コンテンツは前記参照動画コンテンツの全部または一部を含むと判定することを特徴としている。 (2) Further, in the moving image content identification device of the present invention, the feature amount extraction unit, when the identification moving image content is input from the moving image content input unit, displays a time code of a key frame of the selected identification moving image content. The database search unit obtains a difference value between a time code in the feature amount of the identified moving image content and a time code in the feature amount of the reference moving image content corresponding to the feature amount of the identified moving image content. It is calculated for each feature amount of content, and when the number of the difference values having the same value is equal to or greater than a predetermined threshold, it is determined that the identified moving image content includes all or part of the reference moving image content. It is said.

このように、識別動画コンテンツの特徴量におけるタイムコードと、識別動画コンテンツの特徴量に対応する参照動画コンテンツの特徴量におけるタイムコードとの差分値を、識別動画コンテンツの特徴量毎に算出し、同じ値の前記差分値の個数が、所定の閾値以上であった場合、識別動画コンテンツは参照動画コンテンツの全部または一部を含むと判定するので、識別動画コンテンツの全部または一部が、参照動画コンテンツの時間軸上の一部分を切り出したものである場合でも、動画コンテンツの識別が可能となる。 In this way, a difference value between the time code in the feature amount of the identified video content and the time code in the feature amount of the reference video content corresponding to the feature amount of the identified video content is calculated for each feature amount of the identified video content, If the number of the difference values having the same value is equal to or greater than a predetermined threshold value, it is determined that the identified video content includes all or part of the reference video content. Even when a part of the content on the time axis is cut out, the moving image content can be identified.

（３）また、本発明の動画コンテンツ識別装置は、前記動画コンテンツ入力部から前記参照動画コンテンツが入力され、前記特徴量抽出部で前記参照動画コンテンツに含まれる前記キーフレームの特徴量が抽出された場合、前記抽出された参照動画コンテンツに含まれる前記キーフレームの特徴量を蓄積する特徴量蓄積部をさらに備えることを特徴としている。 (3) In the moving image content identification device of the present invention, the reference moving image content is input from the moving image content input unit, and the feature amount of the key frame included in the reference moving image content is extracted by the feature amount extraction unit. A feature amount storage unit for storing the feature amount of the key frame included in the extracted reference moving image content.

このように、動画コンテンツ入力部から参照動画コンテンツが入力され、特徴量抽出部で参照動画コンテンツに含まれるキーフレームの特徴量が抽出された場合、抽出された参照動画コンテンツに含まれるキーフレームの特徴量を蓄積し、データベースを構築するので、識別基準となる参照動画コンテンツのデータベースを構築することが可能となる。 As described above, when the reference video content is input from the video content input unit, and the feature amount of the key frame included in the reference video content is extracted by the feature amount extraction unit, the key frame included in the extracted reference video content is extracted. Since the feature amount is accumulated and the database is constructed, it is possible to construct a database of reference moving image content serving as an identification standard.

（４）また、本発明の動画コンテンツ隙別装置において、前記特徴量抽出部は、予め設定された複数の領域に対して、独立にキーフレームを設定することを特徴としている。 (4) Further, in the moving image content gap identification device according to the present invention, the feature amount extraction unit sets key frames independently for a plurality of preset regions.

このように、予め設定された複数の領域に対して、独立にキーフレームを設定するので、参照動画コンテンツの時間軸上の一部分を切り出したり、参照動画コンテンツにテロップやロゴが挿入されたりしても、動画コンテンツの識別が可能となる。 In this way, since key frames are set independently for a plurality of preset areas, a part of the reference video content on the time axis may be cut out, or a telop or logo may be inserted into the reference video content In addition, it is possible to identify moving image content.

（５）また、本発明の動画コンテンツ識別装置において、前記特徴量抽出部は、前記キーフレームから予め設定された複数の領域から、それぞれ特徴量を抽出することを特徴としている。 (5) In the moving image content identification apparatus of the present invention, the feature amount extraction unit extracts feature amounts from a plurality of regions set in advance from the key frame.

このように、キーフレームから予め設定された複数の領域から、それぞれ特徴量を抽出するので、識別動画コンテンツのキーフレーム中にキャプションやロゴが挿入された場合であっても、高い検索精度を実現することが可能となる。 In this way, feature quantities are extracted from each of a plurality of preset areas from the key frame, so even if captions or logos are inserted into the key frame of the identification video content, high search accuracy is achieved. It becomes possible to do.

（６）また、本発明の動画コンテンツ識別装置において、前記特徴量蓄積部は、前記抽出した特徴量毎にインデックスを作成することを特徴としている。 (6) In the moving image content identification device according to the present invention, the feature amount storage unit creates an index for each extracted feature amount.

このように、抽出した特徴量毎にインデックスを作成するので、識別動画コンテンツの特徴量を用いて検索を行なう際に、処理の高速化を図ることが可能となる。 As described above, since an index is created for each extracted feature amount, it is possible to increase the processing speed when performing a search using the feature amount of the identified moving image content.

（７）また、本発明の動画コンテンツ識別装置において、前記データベース検索部は、前記特徴量抽出部で抽出された識別動画コンテンツのキーフレームおよび前記特徴量について、重要度を設定することを特徴としている。 (7) Further, in the moving image content identification device of the present invention, the database search unit sets importance for the key frame and the feature amount of the identified moving image content extracted by the feature amount extraction unit. Yes.

このように、特徴量抽出部で抽出された識別動画コンテンツのキーフレームおよび特徴量について、重要度を設定するので、この重要度の高い特徴量から検索を行なうことによって、一定時間で検索を打ち切った場合であっても、検索精度を高く維持することが可能となる。 In this way, since the importance level is set for the key frame and the feature amount of the identified moving image content extracted by the feature amount extraction unit, the search is terminated in a certain time by performing the search from the feature amount having the higher importance level. Even in such a case, it is possible to maintain high search accuracy.

（８）また、本発明の動画コンテンツ識別装置において、前記重要度は、キーフレームの時間的安定性および特徴量の特異性に基づいて設定されることを特徴としている。 (8) In the moving image content identification device of the present invention, the importance is set based on temporal stability of key frames and specificity of feature amounts.

このように、重要度は、キーフレームの時間的安定性および特徴量の特異性に基づいて設定されるので、検索を一定時間で打ち切ることができ、さらにその際の精度をなるべく高く維持することが可能となる。 In this way, since the importance is set based on the temporal stability of the key frame and the specificity of the feature amount, the search can be terminated in a certain time, and the accuracy at that time should be maintained as high as possible. Is possible.

（９）また、本発明の動画コンテンツ識別方法は、識別対象である識別動画コンテンツが、識別基準である参照動画コンテンツの全部または一部を含むかどうかを判定する動画コンテンツ識別方法であって、動画コンテンツ入力部から前記動画コンテンツを入力するステップと、特徴量抽出部において、前記入力された動画コンテンツを構成する複数のフレームからキーフレームを選択し、前記選択したキーフレームの特徴量を抽出するステップと、前記動画コンテンツ入力部から前記識別動画コンテンツが入力され、前記特徴量抽出部が前記識別動画コンテンツに含まれる前記キーフレームの特徴量を抽出した場合、データベース検索部において、前記抽出された特徴量毎に、データベースに格納されている前記参照動画コンテンツのキーフレームの特徴量を検索するステップと、前記検索の結果、前記識別動画コンテンツの特徴量に対応する前記参照動画コンテンツの特徴量の個数に基づいて、前記識別動画コンテンツが前記参照動画コンテンツの全部または一部を含むかどうかの判定を行なうステップと、を少なくとも含むことを特徴としている。 (9) Further, the moving image content identification method of the present invention is a moving image content identification method for determining whether or not the identified moving image content to be identified includes all or part of the reference moving image content that is the identification criterion, The step of inputting the moving image content from the moving image content input unit, and the feature amount extraction unit select a key frame from a plurality of frames constituting the input moving image content, and extract the feature amount of the selected key frame And when the identification video content is input from the video content input unit and the feature amount extraction unit extracts the feature amount of the key frame included in the identification video content, the database search unit extracts the extracted feature amount. The key of the reference video content stored in the database for each feature amount Searching for a feature amount of a frame; and, as a result of the search, based on the number of feature amounts of the reference moving image content corresponding to the feature amount of the identified moving image content, the identified moving image content includes all of the reference moving image content or And a step of determining whether or not a part is included.

本発明によれば、識別動画コンテンツのキーフレームの特徴量を抽出し、抽出された特徴量毎に、データベースに格納されている参照動画コンテンツのキーフレームの特徴量を検索し、識別動画コンテンツの特徴量に対応する参照動画コンテンツの特徴量の個数に基づいて、識別動画コンテンツが参照動画コンテンツの全部または一部を含むかどうかの判定を行なうので、識別動画コンテンツの全部または一部が、参照動画コンテンツの時間軸上の一部分を切り出したり、参照動画コンテンツにテロップやロゴが挿入されたりした場合であっても、動画コンテンツの識別が可能となり、検索時間の短縮および検索の精度を高めることが可能となる。 According to the present invention, the feature amount of the key frame of the identified moving image content is extracted, the feature amount of the key frame of the reference moving image content stored in the database is searched for each extracted feature amount, and Since whether or not the identified video content includes all or part of the reference video content is determined based on the number of feature quantities of the reference video content corresponding to the feature amount, all or part of the identified video content is referred to Even if a part of the video content on the time axis is cut out or a telop or logo is inserted in the reference video content, it is possible to identify the video content, thereby shortening the search time and improving the search accuracy. It becomes possible.

本発明の実施形態に係る動画コンテンツ識別装置のブロック図である。It is a block diagram of the moving image content identification apparatus which concerns on embodiment of this invention. 矩形領域Rⁱの設定の例を示す図である。Is a diagram illustrating an example of setting the rectangular region R ^i. 矩形領域Rⁱの設定の例を示す図である。Is a diagram illustrating an example of setting the rectangular region R ^i. 矩形領域Rⁱの設定の例を示す図である。Is a diagram illustrating an example of setting the rectangular region R ^i. いずれかのフレームを選択し、矩形領域Rⁱの特徴量xⁱ(t)を抽出する様子を示す図である。Select one of the frames is a view showing a state of extracting features of the rectangular region R ^ⁱ x ⁱ (t). Color layoutによる特徴量の抽出の方法を示す図である。It is a figure which shows the method of extracting the feature-value by Color layout. キーフレーム候補を選択する概念を示す図である。It is a figure which shows the concept which selects a key frame candidate. 特徴量抽出部１２の動作を示すフローチャートである。4 is a flowchart showing an operation of a feature amount extraction unit 12. キーフレームを選択する動作を示すフローチャートである。It is a flowchart which shows the operation | movement which selects a key frame. データベース検索動作を示すフローチャートである。It is a flowchart which shows a database search operation | movement.

以下、本発明の実施形態について図面を参照して説明する。本実施形態では、予め検出を行なおうとする著作権コンテンツを参照動画コンテンツとして入力し、特徴量を抽出しデータベースを構築しておく。その後、入力された識別動画コンテンツについて、参照動画コンテンツの一部が識別動画コンテンツに含まれているか否かを判定する。 Embodiments of the present invention will be described below with reference to the drawings. In the present embodiment, copyright content to be detected in advance is input as reference moving image content, feature quantities are extracted, and a database is constructed. Thereafter, for the input identified moving image content, it is determined whether or not a part of the reference moving image content is included in the identified moving image content.

図１は、本発明の実施形態に係る動画コンテンツ識別装置のブロック図である。図１に示すように、動画コンテンツ識別装置１０は、動画コンテンツ入力部１１、特徴量抽出部１２、データベース１３、データベース検索部１４、および特徴量蓄積部１５から構成されている。また、これらの構成要素は、制御バス１６に接続され、相互に信号の送受信を行なうことができる。 FIG. 1 is a block diagram of a moving image content identification apparatus according to an embodiment of the present invention. As shown in FIG. 1, the moving image content identification device 10 includes a moving image content input unit 11, a feature amount extraction unit 12, a database 13, a database search unit 14, and a feature amount storage unit 15. These components are connected to the control bus 16 and can transmit / receive signals to / from each other.

動画コンテンツ識別装置１０は、動画コンテンツ入力部１１から参照動画コンテンツおよび識別動画コンテンツを入力する。動画コンテンツ入力部１１は、著作権者やコンテンツプロバイダから提供された参照用の動画コンテンツを入力する。通常は、複数の参照動画コンテンツが入力されるが、本実施形態では、入力された参照動画コンテンツ全てを連結し、１つの動画が入力されたものとして取り扱う。また、動画コンテンツ入力部１１は、識別すべき動画コンテンツ、例えば、動画共有サイトにアップロードされた動画コンテンツや、様々なストレージに蓄積された動画コンテンツを入力する。 The moving image content identification device 10 inputs the reference moving image content and the identified moving image content from the moving image content input unit 11. The moving image content input unit 11 inputs reference moving image content provided by the copyright holder or content provider. Normally, a plurality of reference moving image contents are input, but in the present embodiment, all input reference moving image contents are connected and handled as one moving image input. The moving image content input unit 11 inputs moving image content to be identified, for example, moving image content uploaded to a moving image sharing site and moving image content stored in various storages.

特徴量抽出部１２において、参照動画コンテンツおよび識別動画コンテンツの特徴量を抽出する。特徴量抽出部１２では、参照動画コンテンツおよび識別動画コンテンツから特徴量を抽出する。動画コンテンツ入力部１１から入力された動画コンテンツ（の映像信号）が、ＭＰＥＧ−２、Ｈ．２６４等の圧縮方式で圧縮されている場合は、伸張を行なう。ここでは、予め決定しておいた矩形領域Rⁱごとに次の処理を行なう。なお、Rⁱの配置と数は任意である。 The feature amount extraction unit 12 extracts feature amounts of the reference moving image content and the identification moving image content. The feature amount extraction unit 12 extracts feature amounts from the reference moving image content and the identification moving image content. The moving image content (video signal) input from the moving image content input unit 11 is MPEG-2, H.264, or the like. When compressed by a compression method such as H.264, decompression is performed. Here, a following for each rectangular region R ⁱ which has been previously determined. The arrangement and the number of R ⁱ is arbitrary.

図２Ａ〜図２Ｃは、矩形領域Rⁱの設定の例を示す図である。図２Ａは、画面全体を１つの矩形領域とした場合である。キャプションやロゴなどが挿入されることを想定しないのであれば、この矩形領域を利用する。図２Ｂは、画面下や画面右に字幕などの編集が想定される場合の矩形領域の設定である。どちらか片方に編集が行なわれても、他方の矩形領域に影響がない。図２Ｃは、編集が行なわれる場所を予め決めない場合の矩形領域の設定である。 Figure 2A~ 2C are diagrams showing an example of setting the rectangular region R ^i. FIG. 2A shows a case where the entire screen is a single rectangular area. If it is not assumed that captions or logos are inserted, this rectangular area is used. FIG. 2B shows the setting of a rectangular area when editing such as subtitles is assumed at the bottom or right of the screen. Even if editing is performed on either one, the other rectangular area is not affected. FIG. 2C shows the setting of a rectangular area when the location where editing is performed is not determined in advance.

以下、領域の個数をN_Rとする。図３は、いずれかのフレームを選択し、矩形領域Rⁱの特徴量xⁱ(t)を抽出する様子を示す図である。図３に示すように、動画コンテンツの各フレームの領域Rⁱ（1≦ i≦N_R）から、特徴量xⁱ(t) = (xⁱ ₁(t), xⁱ ₂(t),・・・xⁱ _NDim(t))を抽出する。なお、tは、フレーム番号、NDimは特徴量の次元数である。ここで利用する特徴量は基本的に任意である。例えば、MPEG-7に規定されている、Dominant color、Scalable color、Color structure、Color layout、Edge histogram、Contour shape、Motion activity 等を利用することができる。 Hereinafter, the number of regions is N _R. FIG. 3 is a diagram illustrating a state in which one of the frames is selected and the feature amount x ⁱ (t) of the rectangular region R ⁱ is extracted. As shown in FIG. 3, from the region R ⁱ (1 ≦ i ≦ N _R ) of each frame of the moving image content, the feature amount x ⁱ (t) = (x ⁱ ₁ (t), x ⁱ ₂ (t),.・・ X ⁱ _NDim (t)) is extracted. Note that t is the frame number and NDim is the number of dimensions of the feature quantity. The feature quantity used here is basically arbitrary. For example, Dominant color, Scalable color, Color structure, Color layout, Edge histogram, Contour shape, Motion activity, etc. defined in MPEG-7 can be used.

ここでは、特徴量としてMPEG-7に規定されているColor layout を利用した場合について説明する。図４は、Color layoutによる特徴量の抽出の方法を示す図である。まず、フレームtの領域Rⁱを、８×８に縮小する。そして、必要であればYCbCr表色系に変換する。その後、離散コサイン変換（DCT）を行ない、得られた係数のうち、低周波からY成分なら６係数、CbCr成分なら３係数をジグザグスキャンの順序で選択し、特徴量とする。この場合NDim = 12である。また、Color layout で利用する特徴量をY成分に限定し、DC係数を利用しないとすることで、グレースケール化や輝度のシフトが行なわれても変化しない特徴量とすることもできる。さらにDCTを行なう前に、ヒストグラムの正規化を行なうことでガンマ補正やコントラスト調整にロバストな特徴量とすることもできる。 Here, the case where the color layout defined in MPEG-7 is used as a feature amount will be described. FIG. 4 is a diagram illustrating a method of extracting feature amounts by color layout. First, the region R ⁱ of frame t, is reduced to 8 × 8. If necessary, it is converted to the YCbCr color system. After that, discrete cosine transform (DCT) is performed, and among the obtained coefficients, 6 coefficients for the Y component and 3 coefficients for the CbCr component are selected from the low frequency in the zigzag scan order, and used as the feature amount. In this case, NDim = 12. Further, by limiting the feature quantity used in the color layout to the Y component and not using the DC coefficient, it is possible to obtain a feature quantity that does not change even when gray scale conversion or luminance shift is performed. Furthermore, by performing normalization of the histogram before performing DCT, it is possible to make the feature amount robust to gamma correction and contrast adjustment.

図６は、特徴量抽出部１２の動作を示すフローチャートである。ここでtは動画コンテンツのフレーム番号、iは予め設定されている領域の番号である。図６に示すように、tを１に設定し（ステップＳ１）、iを１に設定し（ステップＳ２）、特徴量を抽出する（ステップＳ３）。次に、iに１を加算し（ステップＳ４）、iが領域の個数N_Rより大きくなったかどうかを判断する（ステップＳ５）。iが領域の個数N_Rより大きくなっていない場合は、ステップＳ３へ遷移する。一方、ステップＳ５において、iが領域の個数N_Rより大きくなった場合は、tに１を加算し（ステップＳ６）、tがN_Fよりも大きくなったかどうかを判断する（ステップＳ７）。なお、N_Fとは、入力動画コンテンツのフレーム数である。ステップＳ７において、tがN_Fよりも大きくなっていない場合は、ステップＳ２へ遷移する。一方、tがN_Fよりも大きくなった場合は、終了する。 FIG. 6 is a flowchart showing the operation of the feature quantity extraction unit 12. Here, t is a frame number of the moving image content, and i is a number of a preset area. As shown in FIG. 6, t is set to 1 (step S1), i is set to 1 (step S2), and feature quantities are extracted (step S3). Next, 1 is added to i (step S4), and it is determined whether i is larger than the number N _{R of} regions (step S5). If i is not larger than the number of areas N _R , the process proceeds to step S3. On the other hand, if i is larger than the number N _{R of} regions in step S5, 1 is added to t (step S6), and it is determined whether t is larger than N _F (step S7). Note that N _F is the number of frames of the input moving image content. In step S7, if t is not greater than N _F, the transition to step S2. On the other hand, if t is greater than N _F, and ends.

本発明では、各フレームの領域Rⁱから特徴量xⁱ(t)を抽出した後、全てのフレームの特徴量を利用する代わりに、特定のフレームをキーフレームとして選択し、そのフレームの特徴量のみを利用する。すなわち、特徴量抽出部１２は、予め設定された複数の領域に対して、独立にキーフレームを設定する。そして、キーフレームから予め設定された複数の領域から、それぞれ特徴量を抽出する。 In the present invention, after extracting the feature value x ⁱ (t) from the region R ^{i of} each frame, instead of using the feature value of all the frames, a specific frame is selected as a key frame, and the feature value of the frame is selected. Use only. That is, the feature quantity extraction unit 12 sets key frames independently for a plurality of preset areas. Then, feature amounts are extracted from a plurality of regions set in advance from the key frame.

図５は、キーフレーム候補を選択する概念を示す図である。キーフレームの選択では、まず特徴量の第k_j成分(xⁱ _kj(t))が極値をとるtを候補とする(1≦k_j≦N_K)。ここでN_Kは利用する係数の数である。ノイズの影響を軽減するため、極値を求める前に、各xⁱ _kj(t)に時間方向の平滑化フィルタ（ガウシアン等）をかけてもよい。具体的な極値の求め方は、x'ⁱ _kj(t) = 0となるtをゼロ交差法で求める。さらに、このtが、各k_jに予め定められた正の整数W_jに対して、
x’ⁱ _kj(t -W_j) < x’ⁱ _kj(t -W_j + 1) <・・・< x’ⁱ _kj(t) <・・・< x’ⁱ _kj(t +W_j- 1) < x’ⁱ _kj(t +W_j)
または
x’ⁱ _kj(t -W_j) > x’ⁱ _kj(t -W_j + 1) >・・・> x’ⁱ _kj(t) >・・・> x’ⁱ _kj(t +W_j- 1) > x’ⁱ _kj(t +W_j)
を満たすとき、領域Rⁱにおいてフレームtをキーフレームとする。 FIG. 5 is a diagram illustrating the concept of selecting key frame candidates. The selection of a key frame, starters k _j component of the feature (x ⁱ _kj (t)) is a candidate for t to an extreme value _{_{(1 ≦ k j ≦ N K}} ). Where N _K is the number of coefficients to be used. In order to reduce the influence of noise, a smoothing filter (Gaussian or the like) in the time direction may be applied to each x ⁱ _kj (t) before obtaining the extreme value. As a specific method for obtaining extreme values, t where x ′ ⁱ _kj (t) = 0 is obtained by the zero crossing method. Furthermore, this t is a positive integer W _j predetermined for each k _j ,
x ' ⁱ _kj (t -W _j ) <x' ⁱ _kj (t -W _j + 1) <... <x ' ⁱ _kj (t) <... <x' ⁱ _kj (t + W _j- 1) <x ' ⁱ _kj (t + W _j )
Or
x ' ⁱ _kj (t -W _j )>x' ⁱ _kj (t -W _j + 1)> ・・・> x ' ⁱ _kj (t)> ・・・>x' ⁱ _kj (t + W _j- 1)> x ' ⁱ _kj (t + W _j )
When satisfying, the frame t keyframes in the region R ^i.

図７は、キーフレームを選択する動作を示すフローチャートである。ここでiは予め設定されている領域の番号であり、jは極値検出に利用する係数の番号である。図７に示すように、まず、iを１に設定し（ステップＳ１１）、jを１に設定し（ステップＳ１２）、x’ⁱ _kj(t)が極値となるtをすべて抽出する（ステップＳ１３）。次に、jに１を加算し（ステップＳ１４）、jがN_Kより大きくなったかどうかを判断する（ステップＳ１５）。jがN_Kより大きくなっていない場合は、ステップＳ１３へ遷移する。一方、ステップＳ１５において、jがN_Kより大きくなった場合は、iに１を加算し（ステップＳ１６）、iが領域の個数N_Rよりも大きくなったかどうかを判断する（ステップＳ１７）。ステップＳ１７において、iがN_Rよりも大きくなっていない場合は、ステップＳ１２へ遷移する。一方、iがN_Rよりも大きくなった場合は、終了する。 FIG. 7 is a flowchart showing an operation for selecting a key frame. Here, i is the number of a preset region, and j is the number of a coefficient used for extreme value detection. As shown in FIG. 7, first, i is set to 1 (step S11), j is set to 1 (step S12), and all t where x ' ⁱ _kj (t) has an extreme value are extracted (step S12). S13). Then, 1 is added to j (step S14), j is determined whether or not greater than N _K (step S15). j is If it is not greater than N _K, a transition to step S13. On the other hand, in step S15, j is if it is greater than N _K, 1 is added to i (step S16), i is determined whether or not greater than the number N _R of the region (step S17). If i is not greater than N _R in step S17, the process proceeds to step S12. On the other hand, if i becomes larger than N _R , the process ends.

なお、特徴量抽出部１２は、参照動画コンテンツと同様に、識別動画コンテンツに対して、キーフレームを選択し、特徴量を抽出する。 The feature amount extraction unit 12 selects a key frame and extracts a feature amount for the identified moving image content, similarly to the reference moving image content.

特徴量蓄積部１５は、特徴量抽出部１２で抽出された特徴量を、検索が高速になるようにインデックス化し、データベース１３に蓄積する。特徴量抽出部１２で選択されたキーフレームおよびそのキーフレームのタイムコードをデータベース１３に蓄積する。具体的には、特徴量xⁱ(t)、タイムコードt、ビデオIDを、領域iおよびキーフレームの選択の基準となった特徴量の第k_j成分ごとにデータベースDⁱ _jに蓄積する。 The feature amount storage unit 15 indexes the feature amounts extracted by the feature amount extraction unit 12 so that the search can be performed at high speed, and stores the indexes in the database 13. The key frame selected by the feature amount extraction unit 12 and the time code of the key frame are stored in the database 13. Specifically, the feature quantity x ⁱ (t), the time code t, and the video ID are stored in the database D ⁱ _j for each k _j-th component of the feature quantity that is a reference for selecting the region i and the key frame.

この際、検索（最近傍探索）を高速で行なうことができるように、木構造やハッシュを用いたインデックスを利用することができる。このような手法としては、R-tree、ANN（Approximate Nearest Neighbor）、LSH（Locality Sensitive Hashing）等が存在する。 At this time, an index using a tree structure or a hash can be used so that a search (nearest neighbor search) can be performed at high speed. Examples of such methods include R-tree, ANN (Approximate Nearest Neighbor), and LSH (Locality Sensitive Hashing).

データベース検索部１４は、特徴量抽出部１２で抽出された特徴量をデータベースから検索し、部分一致するコンテンツを検索する。データベース検索部１４では、識別動画コンテンツの一部が、参照動画コンテンツの一部から構成されていると仮定し、共通する区間を推定する。その後共通する区間候補の類似度を計算し、閾値により本当に同一の区間であるかを判定する。特徴量抽出部１２によって、領域iにおいて特徴量の第k_j成分に基づいて選択されたキーフレームを、tⁱ _j(1),tⁱ _j(2),・・・,tⁱ _j(nⁱ _j)とする。nⁱ _jは、領域iにおいて特徴量の第k_j成分に基づいて選択されたキーフレームの数である。データベース検索部１４では、全てのi,jに対してキーフレームtⁱ _j(1),tⁱ _j(2),・・・,tⁱ _j(nⁱ _j)の特徴量と、データベースDⁱ _j に蓄積されている特徴量を比較する。 The database search unit 14 searches the database for the feature amount extracted by the feature amount extraction unit 12, and searches for partially matching content. The database search unit 14 assumes a part of the identified moving image content from a part of the reference moving image content, and estimates a common section. After that, the similarity of common section candidates is calculated, and it is determined whether they are really the same section based on a threshold value. By the feature amount extraction unit 12, the key frames selected on the basis of the k _j component of the feature in the area ^{_{i, t i j (1)}} , t i j (2), ···, t i j (n ⁱ _j ). n ⁱ _j is the number of key frames selected based on the k _j-th component of the feature amount in the region i. In the database search unit 14, for all i, j, feature quantities of key frames t ⁱ _j (1), t ⁱ _j (2),..., T ⁱ _j (n ⁱ _j ) and the database D ⁱ Compare the feature values stored in _j .

この際に、検索を一定時間で打ち切ることができ、さらにその際の精度をなるべく保つために、データベースを検索するキーフレームに重要度を設定する。すなわち、重要度の高いキーフレームから検索を行ない、検索にかけられる時間が経過すると、そこで検索を打ち切ることとする。重要度Iⁱ _j(t)は、次の数式で定義する。
Iⁱ _j(t) =αPⁱ _j(t) + (1 -α)・Q^ij(t)
ここで、Pⁱ _j(t)は、キーフレームの時間的ずれに対するロバスト性を評価した項（キーフレームの時間的安定性）、Qⁱ _j(t)は、キーフレームの特徴量の特異性を評価した項である。それぞれ下記で定義する。
Pⁱ _j(t)=(min(xⁱ _kj(t)-xⁱ _kj(t-W_j),xⁱ _kj(t)-xⁱ _kj(t+W_j)))^β
×(Σ_1≦j≦NK(x'ⁱ _kj(t))²)^-γ
Qⁱ _j(t)=d_M(xⁱ(t))
d_M(xⁱ(t))は、予め参照動画コンテンツから求めておいた特徴量の分布に対するマハラノビス距離である。β、γは、チューニングによって定めるパラメータである。 At this time, the search can be terminated in a fixed time, and in order to keep the accuracy at that time as much as possible, the importance is set in the key frame for searching the database. That is, a search is performed from a key frame having a high importance level, and when the time required for the search elapses, the search is terminated there. The importance I ⁱ _j (t) is defined by the following equation.
I ⁱ _j (t) = αP ⁱ _j (t) + (1 -α) ・ Q ^ij (t)
Here, P ⁱ _j (t) is a term that evaluates robustness against temporal shift of key frames (temporal stability of key frames), and Q ⁱ _j (t) is the peculiarity of key frame features. Is a term that evaluates. Each is defined below.
P ⁱ _j (t) = (min (x ⁱ _kj (t) -x ⁱ _kj (tW _j ), x ⁱ _kj (t) -x ⁱ _kj (t + W _j ))) ^β
× (Σ _{1 ≦ j ≦ NK} (x ' ⁱ _kj (t)) ² ) ^-γ
Q ⁱ _j (t) = d _M (x ⁱ (t))
d _M (x ⁱ (t)) is a Mahalanobis distance with respect to the distribution of the feature amount obtained in advance from the reference moving image content. β and γ are parameters determined by tuning.

各キーフレームの特徴量の重要度を算出すると、重要度の高いキーフレームの特徴量からデータベース検索を行なう。本発明では、識別動画コンテンツがいずれかの参照動画コンテンツの一部の複製であったと仮定し、複製が行なわれた先頭時刻を求める。具体的には下記の通りである。 When the importance of the feature amount of each key frame is calculated, a database search is performed from the feature amount of the key frame having a high importance. In the present invention, assuming that the identified moving image content is a copy of a part of any reference moving image content, the start time at which the copy was performed is obtained. Specifically, it is as follows.

領域iにおいて特徴量の第k_j成分に基づいて選択されたキーフレームtのタイムコードをTCとする。このキーフレームtの特徴量xⁱ(t) の最近傍となる特徴量を、データベースDⁱ _jから検索する。最近傍となった特徴量のタイムコードをTC'とする。この対応が正しければ識別動画コンテンツは時刻TC'-TCから複製されたことになるため、検索ごとにTC'-TCを推定し、投票によって、可能性の高いTC'-TCを複製候補とする。ロバスト性および速度向上のため時刻推定を固定パラメータT で量子化を行なうため、実際には[(TC'-TC)/T]に投票する。この投票は最近傍のみでなく、K近傍から投票を行なっても良い。その際には、計算時間は増加するが精度は向上する。最後に閾値Thより多くの投票が行なわれた時刻を検出結果として出力する。 Let TC be the time code of the key frame t selected based on the k _j-th component of the feature quantity in the region i. A feature quantity that is the closest to the feature quantity x ⁱ (t) of the key frame t is searched from the database D ⁱ _j . Let TC 'be the time code of the feature value that was closest. If this correspondence is correct, the identified video content has been copied from the time TC'-TC. Therefore, TC'-TC is estimated for each search, and the TC'-TC that has a high possibility is determined as a copy candidate by voting. . In order to quantize the time estimation with a fixed parameter T 1 to improve robustness and speed, we actually vote for [(TC'-TC) / T]. This voting may be performed not only from the nearest neighbor but also from the K neighbor. In that case, the calculation time increases, but the accuracy improves. Finally, the time at which more votes than the threshold Th are made is output as a detection result.

図８は、データベース検索動作を示すフローチャートである。ここでcntは投票を行った特徴量の数のカウンタである。図８に示すように、まず、重要度を算出し（ステップＳ２１）、最も重要度の高い特徴量で検索を行なう（ステップＳ２２）。次に、上記の投票を行ない（ステップＳ２３）、cntに１を加算する（ステップＳ２４）。そして、cntが閾値Th2よりも大きくなったかどうかを判断する（ステップＳ２５）。ステップＳ２５において、cntが閾値Th2よりも大きくなっていない場合は、ステップＳ２２へ遷移する。一方、cntが閾値Th2よりも大きくなった場合は、投票数がTh以上の時刻を全て出力して（ステップＳ２６）、終了する。 FIG. 8 is a flowchart showing the database search operation. Here, cnt is a counter of the number of feature values that have been voted. As shown in FIG. 8, first, importance is calculated (step S21), and a search is performed with the feature quantity having the highest importance (step S22). Next, the voting is performed (step S23), and 1 is added to cnt (step S24). And it is judged whether cnt became larger than threshold value Th2 (step S25). If cnt is not greater than the threshold value Th2 in step S25, the process proceeds to step S22. On the other hand, when cnt is larger than the threshold value Th2, all times when the number of votes is Th or more are output (step S26), and the process is terminated.

１０動画コンテンツ識別装置
１１動画コンテンツ入力部
１２特徴量抽出部
１３データベース
１４データベース検索部
１５特徴量蓄積部
１６制御バス
DESCRIPTION OF SYMBOLS 10 Movie content identification apparatus 11 Movie content input part 12 Feature quantity extraction part 13 Database 14 Database search part 15 Feature quantity storage part 16 Control bus

Claims

A video content identification device that determines whether or not the identified video content that is the identification target includes all or part of the reference video content that is the identification criterion,
A video content input unit for inputting the video content;
A feature amount extraction unit that selects a key frame from a plurality of frames constituting the input video content and extracts a feature amount of the selected key frame;
When the identified moving image content is input from the moving image content input unit and the feature amount extraction unit extracts the feature amount of the key frame included in the identified moving image content, the extracted feature amount is stored in a database. A database search unit for searching for a feature amount of a key frame of the reference video content,
As a result of the search, it is determined whether or not the identified moving image content includes all or part of the reference moving image content based on the number of feature amounts of the reference moving image content corresponding to the feature amount of the identified moving image content. A moving image content identification apparatus characterized by the above.

The feature amount extraction unit acquires a time code of a key frame of the selected identification video content when the identification video content is input from the video content input unit;
The database search unit uses the difference value between the time code in the feature amount of the identified moving image content and the time code in the feature amount of the reference moving image content corresponding to the feature amount of the identified moving image content as a feature of the identified moving image content. The identification moving image content is determined to include all or a part of the reference moving image content when the number of the difference values calculated for each amount is equal to or greater than a predetermined threshold value. Item 2. The moving image content identification device according to Item 1.

When the reference moving image content is input from the moving image content input unit and the feature amount of the key frame included in the reference moving image content is extracted by the feature amount extraction unit, the reference moving image content is included in the extracted reference moving image content The moving image content identification apparatus according to claim 1, further comprising a feature amount storage unit that stores the feature amount of the key frame.

4. The moving image content identification apparatus according to claim 1, wherein the feature amount extraction unit sets key frames independently for a plurality of preset regions.

5. The moving image content identification apparatus according to claim 4, wherein the feature amount extraction unit extracts feature amounts from a plurality of regions set in advance from the key frame.

The moving image content identification apparatus according to claim 4, wherein the feature amount storage unit creates an index for each of the extracted feature amounts.

The said database search part sets the importance about the key frame and the said feature-value of the identification moving image content extracted by the said feature-value extraction part, The Claim 1 characterized by the above-mentioned. Video content identification device.

8. The moving image content identification apparatus according to claim 7, wherein the importance is set based on temporal stability of a key frame and specificity of a feature amount.

A video content identification method for determining whether or not the identified video content that is the identification target includes all or part of the reference video content that is the identification criterion,
Inputting the video content from a video content input unit;
In the feature amount extraction unit, selecting a key frame from a plurality of frames constituting the input video content, and extracting the feature amount of the selected key frame;
When the identified moving image content is input from the moving image content input unit, and the feature amount extraction unit extracts the feature amount of the key frame included in the identified moving image content, the database search unit extracts each feature amount And searching for a keyframe feature quantity of the reference video content stored in the database;
As a result of the search, it is determined whether or not the identified moving image content includes all or part of the reference moving image content based on the number of feature amounts of the reference moving image content corresponding to the feature amount of the identified moving image content. A moving image content identification method comprising at least a step.