JP5685324B2

JP5685324B2 - Method and apparatus for comparing pictures

Info

Publication number: JP5685324B2
Application number: JP2013547935A
Authority: JP
Inventors: レン，イエンソン; チヤン，フアンジヨー; ウツド，トーマス・エル
Original assignee: アルカテル−ルーセント
Priority date: 2011-01-07
Filing date: 2012-01-04
Publication date: 2015-03-18
Anticipated expiration: 2032-01-04
Also published as: KR101556513B1; WO2012093339A2; EP2661710A2; KR20130108427A; CN103430175A; CN103430175B; WO2012093339A3; JP2014506366A

Description

本発明は、映像を比較するための方法および装置に関する。 The present invention relates to a method and apparatus for comparing images.

たとえばＹｏｕＴｕｂｅ（登録商標）、ＧｏｏｇｌｅＶｉｄｅｏ、およびＹａｈｏｏ！Ｖｉｄｅｏなどの動画共有ウェブサイトでは、映像コンテンツはユーザによってサイトにアップロードされ、検索エンジンによって他者が利用できるようにすることができる。現在のウェブ映像検索エンジンは、ユーザによって入力された特定のテキストクエリに基づいて、それらの関連するスコアに従ってランク付けされた検索結果のリストを提供すると考えられる。ユーザは次いで、関心のある１つまたは複数の映像を見つけるために結果を検討しなければならない。 For example, YouTube (registered trademark), Google Video, and Yahoo! In video sharing websites such as Video, video content can be uploaded to the site by the user and made available to others by a search engine. Current web video search engines are believed to provide a list of search results ranked according to their associated score based on specific text queries entered by the user. The user must then review the results to find one or more videos of interest.

ユーザがホストに映像をアップロードし、映像を取得し、何らかの修正を行って再びそれらを分配するのは容易なので、映像検索結果には潜在的に非常に多くの複製物、またはほぼ複製物のコンテンツが存在する。このような複製物は、それらの全体的なコンテンツおよび主観的印象に基づいてユーザによって「本質的に同じ」であると見なされることになる。たとえば複製物映像コンテンツは、同一またはほぼ同一のコンテンツであるが異なるファイルフォーマットである、異なる符号化パラメータを有する、および／または長さが異なる、映像シーケンスを含み得る。他の差異としては、色および／もしくは照明の変更などの光度測定的な変動、ならびに／またはキャプション、ロゴ、および／もしくは縁取りの追加または改変などの空間および時間ドメインでの軽微な編集動作の場合がある。これらの例は網羅的なリストを意図するものではく、複製物映像には他のタイプの差異も生じ得る。 It's easy for users to upload videos to the host, get the videos, make some corrections and distribute them again, so the video search results will have potentially very many or nearly duplicate content Exists. Such replicas would be considered “essentially the same” by the user based on their overall content and subjective impressions. For example, the duplicate video content may include video sequences that are the same or nearly the same content but in different file formats, have different encoding parameters, and / or have different lengths. Other differences include photometric variations, such as color and / or lighting changes, and / or minor editing operations in the space and time domains, such as adding or modifying captions, logos, and / or borders There is. These examples are not intended to be exhaustive lists, and other types of differences may occur in the duplicate video.

複製物映像の急増により、ユーザが実際に欲しいコンテンツをユーザが見つけることが難しくまたは不便になり得る。例として、ＹｏｕＴｕｂｅ、ＧｏｏｇｌｅＶｉｄｅｏ、Ｙａｈｏｏ！Ｖｉｄｅｏからのサンプル的な照会に基づくと、検索結果にリストされた中で平均で２７％より多いほぼ複製物の映像が存在することが分かり、人気のある映像は結果内で最も複製されたものである。検索結果における高い比率の複製物映像を前提として、ユーザは検索結果をより分けてユーザが必要とする映像を見つけるのにかなりの時間をかけなければならず、すでに見た映像の類似のコピーを繰り返し視聴しなければならない。複製物の結果は、映像検索、取り出し、およびブラウジングのユーザの体験の価値を低下させる。さらにこのような複製された映像コンテンツは、ネットワークに全体にわたって複製された映像データを記憶し、伝送することによりネットワークオーバヘッドを増加させる。 The proliferation of duplicate images can make it difficult or inconvenient for users to find the content they actually want. For example, youtube, google video, Yahoo! Based on a sample query from Video, we can see that on average, there are nearly 27% of the videos listed in the search results, with the most popular being the most duplicated in the results. It is. Given a high proportion of duplicate video in the search results, the user must spend a considerable amount of time separating the search results to find the video that the user needs, and make a similar copy of the video already seen. You have to watch it repeatedly. The duplicate results reduce the value of the video search, retrieval, and browsing user experience. Furthermore, such replicated video content increases network overhead by storing and transmitting the replicated video data across the network.

１つのタイプの映像コピー検出技法はシーケンスマッチングである。シーケンスマッチングでは、複数のフレームを有するある時間間隔が、クエリ映像と目標映像の類似性を比較する基準となる。通常、これは、クエリ映像フレームおよび目標映像フレームの両方から特性のシーケンスを抽出するものであり、これはたとえば順序、動き、色、および図心をベースとする特性とすることができる。抽出された特性シーケンスは次いで、映像間の類似性距離を求めるために比較される。たとえば順序識別特性が用いられる場合は、各映像フレームは最初にＮ１×Ｎ２ブロックに分割され、各ブロックの平均の明度が計算される。次いで各フレームに対して、ブロックがそれらの平均の明度に従ってランク付けされる。ランキング順位はそのフレームの順序尺度と見なされる。一方の映像に対する順序尺度のシーケンスは、他方のそれと比較されて両者の類似性が評価される。 One type of video copy detection technique is sequence matching. In sequence matching, a certain time interval having a plurality of frames is a reference for comparing the similarity between the query video and the target video. This typically extracts a sequence of characteristics from both the query video frame and the target video frame, which can be based on, for example, order, motion, color, and centroid. The extracted feature sequences are then compared to determine the similarity distance between the videos. For example, when the order identification characteristic is used, each video frame is first divided into N1 × N2 blocks, and the average brightness of each block is calculated. For each frame, the blocks are then ranked according to their average brightness. The ranking order is considered as an order measure for that frame. The sequence of the order scale for one video is compared with that of the other to evaluate the similarity between them.

シーケンスマッチングは、複製物映像間での重複位置の始まりを確定することを可能にする。シーケンスマッチング手法は、ほとんど同一な映像、ならびにコーディングおよびフレーム分解能変更などのフォーマット変更を有する映像のコピー、および空間および時間ドメインでの軽微な編集を有するものを識別するのに適している。具体的には、空間的および時間的な順序識別特性を用いることにより、映像デジタル化／符号化プロセス（たとえば色、輝度、およびヒストグラム等化、符号化パラメータにおける変化）、および表示フォーマット変換（たとえばレターボックスまたはピラーボックスへの変換）、および部分的コンテンツの変更（たとえばクロッピングおよびズームイン）によって導入された映像歪みの検出が可能になる。 Sequence matching makes it possible to determine the beginning of the overlap position between duplicate images. The sequence matching approach is suitable for identifying almost identical videos, as well as copies of videos with format changes such as coding and frame resolution changes, and those with minor edits in the spatial and temporal domains. Specifically, by using spatial and temporal order identification characteristics, video digitization / coding processes (eg, color, brightness, and histogram equalization, changes in coding parameters), and display format conversion (eg, Detection of video distortion introduced by conversion to letterbox or pillarbox) and partial content changes (eg cropping and zooming in).

シーケンスマッチング技法は必要な計算が比較的容易であり、特に順序尺度を用いたときにフレームのコンパクトな表示をもたらす。シーケンスマッチングは計算の効率が良い傾向があり、ライブ映像を処理するためにリアルタイムの計算を実行することができる。たとえば１つのフレームの２×２分割を用いた順序尺度は、各フレームを表すのに４次元しか必要とせず、２つのフレーム間の必要な比較点がより少ない。 The sequence matching technique is relatively easy to compute and results in a compact display of the frame, especially when using an ordinal measure. Sequence matching tends to be computationally efficient, and real-time calculations can be performed to process live video. For example, an ordinal measure using a 2 × 2 split of one frame requires only four dimensions to represent each frame and requires fewer comparison points between the two frames.

しかし、既存のシーケンスマッチングをベースとする技法は、フレームの挿入、削除、または置換などのフレームシーケンスにおける変更が存在する複製物映像クリップを検出することができない。フレームシーケンスの変更はユーザ編集によって、またはたとえば動画共有ウェブサイトによる映像へのコマーシャルの挿入によって導入される。ユーザ修正のタイプを予め推定することは実行可能ではないので、フレームシーケンス変更を検出する能力がないことにより、シーケンスマッチング技法の現実の問題への適用可能性は限定される。 However, existing sequence matching-based techniques cannot detect duplicate video clips that have changes in the frame sequence such as frame insertion, deletion, or replacement. Frame sequence changes are introduced by user editing or by inserting commercials into the video, for example by a video sharing website. Since it is not feasible to pre-estimate the type of user modification, the inability to detect frame sequence changes limits the applicability of sequence matching techniques to real problems.

フレームの挿入、削除、または置換などのフレームシーケンス改変を有する複製物映像を検出するための既存の解決策は、キーフレームマッチング技法に基づく。 Existing solutions for detecting duplicate images with frame sequence modifications such as frame insertion, deletion, or replacement are based on key frame matching techniques.

キーフレームマッチング技法は、通常は、映像を表すように、映像を一連のキーフレームに区分化する。各キーフレームは次いで領域に分割され、目立った局所領域から特性が抽出される。特性はたとえば、各領域に対する色、テクスチャ、角部、または形状特性とすることができる。キーフレームマッチングは、フレームの時間的順序における変更または挿入／削除など、かなりの程度の編集を受けた近似するコピーを検出することができる。しかしキーフレーム内には全く多くの局所的特性があるので、キーフレームを識別し、各キーフレームから局所的特性を抽出し、映像クリップをデータベース内の大量の映像とマッチングさせるためにそれらの間の計量距離（ｍｅｔｒｉｃｄｉｓｔａｎｃｅ）の比較を行うのは計算法的に費用がかかる。 Key frame matching techniques typically segment the video into a series of key frames to represent the video. Each key frame is then divided into regions and characteristics are extracted from the prominent local regions. The characteristic can be, for example, a color, texture, corner, or shape characteristic for each region. Keyframe matching can detect approximate copies that have undergone a significant degree of editing, such as changes in the temporal order of frames or insertion / deletion. But there are quite a few local characteristics within a keyframe, so keyframes are identified, local characteristics are extracted from each keyframe, and between them to match a video clip with a large amount of video in the database. It is computationally expensive to compare the metric distances.

最近の研究は、特性ベクトルの高速指標付けにより、または統計情報を用いて特性ベクトルの次元を低くすることにより、キーフレームマッチング方法の速度を改善することに向けられている。しかし、オンライン分析の場合は、映像をキーフレームに区分化するコスト、およびクエリ映像から局所的特性を抽出するコストの両方は依然として避けられない。Ｗｅｂ２．０画像共有環境におけるオンラインリアルタイム映像複製検出を実現することが現実の課題となる。キーフレームマッチング手法は、データベース映像を集約し分類するためのきめの細かい分析を用いたオフライン映像冗長性検出に、より適している。 Recent work has been directed to improving the speed of keyframe matching methods by fast indexing of feature vectors or by reducing the dimension of feature vectors using statistical information. However, in the case of online analysis, both the cost of segmenting the video into key frames and the cost of extracting local features from the query video are still inevitable. Realizing online real-time video replication detection in a Web 2.0 image sharing environment is a real challenge. The key frame matching technique is more suitable for off-line video redundancy detection using fine-grained analysis to aggregate and classify database videos.

本発明の第１の態様によれば、クエリ映像と目標映像を比較する方法は、クエリ映像のフレームおよび目標映像のフレームを複数のブロックに分割するステップと、各ブロックに対する平均明度値を計算するステップとを含む。クエリ映像に対する複数のクエリ時系列が生成され、各クエリ時系列はクエリ映像の異なるフレーム内の同じ場所からのブロックに対する平均明度値の時間的変動を表す。目標映像に対する目標時系列が生成され、各目標時系列は目標映像の異なるフレーム内の同じ場所からのブロックに対する平均明度値の時間的変動を表す。クエリ時系列および目標時系列は、クエリ映像と目標映像の間にアラインメントが存在するかどうかの判定に用いられる。本発明を用いることにより、類似性を求めて比較することができる時系列を生成することができる。複製物映像はそれらのそれぞれの時系列において類似性を示し、これはそれらが関係していることを識別するために用いることができる。本発明による方法は、２つの映像間の比較空間を低減することによって効率的な映像複製検出をもたらす。 According to the first aspect of the present invention, a method for comparing a query video and a target video includes dividing a query video frame and a target video frame into a plurality of blocks, and calculating an average brightness value for each block. Steps. A plurality of query time series for the query video is generated, each query time series representing a temporal variation of the average brightness value for blocks from the same location in different frames of the query video. A target time series for the target video is generated, each target time series representing the temporal variation of the average brightness value for blocks from the same location in different frames of the target video. The query time series and the target time series are used to determine whether or not there is an alignment between the query video and the target video. By using the present invention, it is possible to generate a time series that can be compared for similarity. The duplicate images show similarities in their respective time series, which can be used to identify that they are related. The method according to the invention provides efficient video duplication detection by reducing the comparison space between two videos.

一実施形態は、クエリ時系列および目標時系列をそれぞれ１組の離散的な線形区分に区分化するステップと、それらの線形区分の局所的シーケンスアラインメントを行うステップとを含む。線形区分化は、平均映像明度を線形上昇部／下降部の離散的なリストに圧縮することを可能にし、次いでそれらはアラインメントについて比較することができる。 One embodiment includes partitioning the query time series and the target time series into a set of discrete linear sections, respectively, and performing local sequence alignment of the linear sections. Linear segmentation allows the average video brightness to be compressed into a discrete list of linear ascending / descending parts, which can then be compared for alignment.

複製物映像では、重複する映像領域は通常は、映像シーケンスの全体の長さに跨がることはなく、同様な領域は分離され得る。したがって線形区分の局所的アラインメントが必要になる。バイオインフォマティクスでは２つのヌクレオチドまたはタンパク質配列間での類似した領域を判定するための、Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムがよく知られている。Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムは、すべての可能な長さのストリング区分を比較し、類似性尺度を最適化する。本発明者らは、映像明度区分に対する局所的アラインメントを行うように、Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムを拡張できることを認識した。ストリングを比較する代わりに、映像間の局所的最適アラインメントを見出すために明度線形区分が比較される。 In replicated video, overlapping video regions typically do not span the entire length of the video sequence, and similar regions can be separated. Therefore, a local alignment of linear sections is required. In bioinformatics, the Smith-Waterman algorithm is well known for determining similar regions between two nucleotide or protein sequences. The Smith-Waterman algorithm compares all possible length string segments and optimizes the similarity measure. The inventors have recognized that the Smith-Waterman algorithm can be extended to perform local alignment for video brightness classification. Instead of comparing the strings, the lightness linear segments are compared to find a local optimal alignment between the images.

Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムは、最適化された検索をもたらすための動的計画法アルゴリズムである。これは時間的およびメモリリソースの要求が相当に厳しく、計算の複雑さはＯ（ＭＮ）であり、記憶容量はＯ（ｍｉｎ（Ｍ，Ｎ））であり、ただしＭおよびＮは比較を受けるシーケンスの長さである。 The Smith-Waterman algorithm is a dynamic programming algorithm for producing optimized searches. This is quite time and memory resource demanding, computational complexity is O (MN), storage capacity is O (min (M, N)), where M and N are sequences to be compared Is the length of

検索プロセスを加速するために一実施形態では、すべての明度区分をアラインメントする代わりに、比較される映像の重要な識別特性を表すものとして、主要上昇部／下降部のシーケンスが選択される。発見的方法は、より時間のかかるＳｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムを行う前に、成功するアラインメントを結果として生じそうもないアラインメントを削除することによって、それらの主要上昇部／主要下降部の高速なアラインメントをもたらすために適用される。これにより計算のコストが低減される。発見的方法は、非常に異なる映像を除去することにより、および類似した映像に対する潜在的にマッチする領域に絞り込むことによってマッチングアルゴリズムの実行を促進する。 To accelerate the search process, in one embodiment, instead of aligning all lightness segments, the primary ascending / descending sequence is selected as representing the key distinguishing characteristics of the images being compared. Heuristics provide fast alignment of their major ascents / major declines by removing alignments that are unlikely to result in a successful alignment before doing the more time-consuming Smith-Waterman algorithm Applied for. This reduces the calculation cost. Heuristics facilitate the execution of matching algorithms by removing very different videos and by narrowing down to potentially matching areas for similar videos.

本発明による一実施形態は、映像複製検出技法を適用する前に予めユーザ修正のタイプを知ることが実行可能でない場合に有利であり、シーケンスマッチング技法を用いることを可能にする。さらにこれはシーケンスマッチング手法の使用の利点を保持し、効率的な検出をもたらす。 One embodiment according to the present invention is advantageous when it is not feasible to know the type of user modification in advance before applying the video duplicate detection technique, and allows a sequence matching technique to be used. This further retains the advantages of using sequence matching techniques and provides efficient detection.

本発明による一実施形態を用いたフレーム変更を有する複製物映像の検出は、ユーザフィーチャーとして動画共有ウェブサイトによって使用可能である、またはロイヤルティ支払いを追跡し、および可能性のある著作権侵害を検出するために映像コンテンツプロバイダによって使用可能である、または通信「パイプ」（たとえばインターネットサービスプロバイダ（ＩＳＰ）、ピアツーピア（Ｐ２Ｐ）システムプロバイダ、コンテンツ配給ネットワーク（ＣＤＮ））によってネットワークトラフィックを低減し映像コンテンツの保管を管理するために使用可能である。これは、ユーザが検索、取り出し、およびブラウジングするためのサービスを提供するように、ほぼ複製物の映像の除去または集約において動画共有ウェブサイトを支援し得る。これはまた、たとえば高品質（ＨＤ）または３Ｄを有する類似した映像を検出することによって、映像コンテンツベースの検索を容易にする。 Detection of duplicate video with frame changes using an embodiment according to the present invention can be used by a video sharing website as a user feature, or track royalty payments and detect possible piracy. Can be used by video content providers to detect or reduce network traffic by communication “pipes” (eg Internet service providers (ISPs), peer-to-peer (P2P) system providers, content distribution networks (CDNs)) Can be used to manage storage. This may assist the video sharing website in removing or aggregating the video of nearly duplicates so that the user can provide services for searching, retrieving, and browsing. This also facilitates video content-based searching, for example by detecting similar videos with high quality (HD) or 3D.

既存の映像複製システムは、本発明による一実施形態を含めることによって、フレーム挿入、削除、または置換などのユーザ修正を取り扱う能力を向上させることができる。 Existing video duplication systems can improve their ability to handle user modifications such as frame insertion, deletion, or replacement by including an embodiment according to the present invention.

本発明の第２の態様によれば、装置は第１の態様による方法を実行するようにプログラムされまたは構成される。 According to a second aspect of the invention, the apparatus is programmed or configured to perform the method according to the first aspect.

本発明の第３の態様によれば、第１の態様による方法を実行するための機械実行可能なプログラムを記憶したデータ記憶媒体が提供される。 According to a third aspect of the present invention there is provided a data storage medium storing a machine executable program for performing the method according to the first aspect.

次に本発明のいくつかの実施形態について、例のみとして添付の図面を参照して説明する。 Several embodiments of the present invention will now be described by way of example only with reference to the accompanying drawings.

比較されるべき映像、および比較処理における一段階を概略的に示す図である。It is a figure which shows roughly the one step in the image | video which should be compared and a comparison process. 本発明による方法を概略的に示す図である。Fig. 2 schematically shows a method according to the invention. １つのブロックに対する明度の時間変化を概略的に示すグラフである。It is a graph which shows roughly the time change of the lightness with respect to one block. 線形区分化を概略的に示すグラフである。3 is a graph schematically showing linear segmentation. 比較される映像に対する明度の変化を概略的に示す図である。It is a figure which shows schematically the change of the brightness with respect to the image | video to be compared. 図２の方法で用いられるマトリックスを概略的に示す図である。FIG. 3 schematically shows a matrix used in the method of FIG. 図２の方法で用いられるマッチングにおけるステップを概略的に示す図である。FIG. 3 schematically shows the steps in matching used in the method of FIG. 図２の方法で用いられるマッチングにおけるステップを概略的に示す図である。FIG. 3 schematically shows the steps in matching used in the method of FIG. 本発明による装置を概略的に示す図である。Fig. 1 schematically shows a device according to the invention.

図１を参照すると、複数のフレームを有するクエリ映像１は、それらが複製物であるかどうかを判定するために１つまたは複数の目標映像と比較される。 Referring to FIG. 1, a query video 1 having multiple frames is compared with one or more target videos to determine if they are duplicates.

図２を参照すると、２ではクエリ映像１内の各フレームはＮ１×Ｎ２ブロックに分割される。３では各ブロックに対する平均明度値が計算される。各フレームを分割することにより、分割された部分領域内の明度変化の変動が保持される。４では各ブロックについて、計算された平均明度値がフレーム番号に対してプロットされてクエリ時系列が生成される。この実施形態では、映像１に関連するＮ１×Ｎ２の時系列を作成するために、すべてのブロックが処理される。他の実施形態では選択されたブロックのみが必要であり、したがって結果としてＮ１×Ｎ２より少ない時系列が生成される。 Referring to FIG. 2, in 2, each frame in the query video 1 is divided into N1 × N2 blocks. At 3, the average brightness value for each block is calculated. By dividing each frame, the variation of the brightness change in the divided partial area is maintained. At 4, for each block, the calculated average brightness value is plotted against the frame number to generate a query time series. In this embodiment, all blocks are processed to create an N1 × N2 time series associated with video 1. In other embodiments, only selected blocks are needed, and as a result fewer time series than N1 × N2 are generated.

比較のために、図１に示される目標映像５はクエリ映像１に基づくが、ヒストグラム等化、輝度の追加、ならびに縁取りおよびフレーム削除によって修正済みである。目標映像５が上述と同じやり方で処理されると、６に示される目標時系列が得られる。目標映像５からのブロックに対する明度の変化は、映像１のそれと形において全体的に同様となることが分かる。たとえば４でのクエリ時系列に対するフレーム番号８０６では１つのブロックに対する平均明度が増加するが、別のブロックのそれは減少し、それによりそれらは交差する。同様な交差は、６での目標時系列に対するフレーム７３９に見ることができる。 For comparison, the target video 5 shown in FIG. 1 is based on the query video 1 but has been modified by histogram equalization, luminance addition, and bordering and frame deletion. When the target video 5 is processed in the same manner as described above, the target time series shown in 6 is obtained. It can be seen that the change in brightness for the block from the target video 5 is generally similar in shape to that of video 1. For example, frame number 806 for the query time series at 4 increases the average brightness for one block, but decreases that of another block, thereby crossing them. A similar intersection can be seen at frame 739 for the target time series at 6.

図２の７での次のステップは、部分線形区分化技法を用いることにより、クエリおよび目標時系列における時間的変化によってもたらされる情報を捕捉することである。時系列を区分化することによって映像は圧縮され、映像明度の時間的変化における本質的な情報の大部分が捕捉される。ユーザ修正、映像歪み、およびフォーマット変換により、映像複製物検出において正確なマッチを見出すことは期待されず、時間的明度の軽微な変化を無視することにより映像複製物検出プロセスは比較的ノイズの影響を受けにくくなる。 The next step at 7 in FIG. 2 is to capture information resulting from temporal changes in the query and target time series by using a partial linear segmentation technique. By segmenting the time series, the video is compressed and most of the essential information in the temporal change in video brightness is captured. User correction, video distortion, and format conversion are not expected to find an exact match in video replica detection, and the video replica detection process is relatively noisy by ignoring minor changes in brightness over time. It becomes difficult to receive.

図３ａは、図１の４または６に示されるものなどの、１つの時系列の一部の平均明度における変動を示す。図３ｂは、線形区分化が適用された後の図１ａに示される時系列の一部を示す。 FIG. 3a shows the variation in the average brightness of a portion of one time series, such as that shown in 4 or 6 of FIG. FIG. 3b shows a portion of the time series shown in FIG. 1a after linear segmentation has been applied.

時系列を区分するためにはボトムアップアルゴリズムが用いられる。ボトムアップ手法はよく知られた、時系列における近似アルゴリズムである。これはできる限り微細な近似から始めて、終了基準が満たされるまで反復的に区分をマージする。この場合は近似ラインを検出するのに線形回帰ではなく線形補間が用いられ、なぜなら線形補間は複雑さの低い計算を用いて一定の時間内に得られるからである。潜在的な区分の適合品質は残留誤差を用いて評価される。残留誤差は、最良適合ラインと実際のデータ点のすべての縦方向の差を取り、それらを平方し、次いでそれらを合計することによって計算される。 A bottom-up algorithm is used to segment the time series. The bottom-up method is a well-known approximation algorithm in time series. This starts with the finest possible approximation and iteratively merges the partitions until the termination criteria are met. In this case, linear interpolation is used instead of linear regression to detect the approximate line, because linear interpolation is obtained within a certain time using a low complexity calculation. The quality of fit of potential categories is evaluated using residual errors. The residual error is calculated by taking all the longitudinal differences between the best fit line and the actual data points, squaring them and then summing them.

他の実施形態では時系列の高速線形区分化は、極値点としての主要極大点および主要極小点の抽出を用いた補間方法によって達成される。図４ａは極大点および極小点を用いた線形近似を示す。しかし発明者らは、これらの点のみに依存すると８に示されるものなどのジャンプ点が除外されることを認識した。ジャンプ点は、たとえば短い時間距離内の上向きまたは下向きのジャンプなど、値の急な変化に対応する。映像ブロック系列の明度曲線の場合は、これらのジャンプは通常は場面境界を示し、ハードカットまたはフェードイン／アウトによって引き起こされる。したがってこの実施形態では線形区分化技法は、ジャンプ点も含むように拡張され、それにより線形区分化方法に用いられる極値点は、図４ｂに示されるように極大点、極小点、およびジャンプ点となる。 In other embodiments, fast linear segmentation of the time series is achieved by an interpolation method using extraction of major maxima and major minima as extreme points. FIG. 4a shows a linear approximation using local maxima and minima. However, the inventors have recognized that depending on these points alone, jump points such as those shown in 8 are excluded. A jump point corresponds to a sudden change in value, for example an upward or downward jump within a short time distance. In the case of a brightness curve of a video block sequence, these jumps usually indicate scene boundaries and are caused by hard cuts or fade-in / out. Thus, in this embodiment, the linear segmentation technique is extended to include jump points, so that the extreme points used in the linear segmentation method are the maximum, minimum, and jump points as shown in FIG. 4b. It becomes.

時系列の線形区分化の後に、顕著な映像識別特性をもたらすものとして時系列内の主要上昇部／下降部が９で選択される。これにより線形区分をアラインメントするための探索空間を縮小することができる。 After the time series linear segmentation, the main ascending / descending parts in the time series are selected at 9 to provide significant video discrimination characteristics. This can reduce the search space for aligning linear segments.

より長い距離およびより深みのある高さを有する線形区分は通常は、情景の顕著な変化を表す。それらはしたがって主要上昇部として選択される。連続する主要上昇部のマッチは、同じ主要な情景変化のシーケンスを有する類似の挙動に従う映像コピーを示す。これと対照的に深みのある高さであるが非常に長さの短い線形区分は通常は、ハードカットまたはフェードなどの場面境界に関連する。このような線形区分はしばしば、情景内の変化を表すものより少ない情報を含む。すべての分割されたブロックからの線形区分が、同じ時間（すなわち同じ開始フレームＩＤ）に生じる同じ短い距離内に深みのある高さを有する場合は、場面境界と判定される。場面境界を表すこれらの線形区分は、主要上昇部を選択するプロセスにおいて無視される。 Linear segments with longer distances and deeper heights usually represent significant changes in the scene. They are therefore selected as the main climb. Successive major ascending matches show video copies following similar behavior with the same major scene change sequence. In contrast, deep, but very short linear segments are usually associated with scene boundaries such as hard cuts or fades. Such linear sections often contain less information than those that represent changes in the scene. A linear boundary from all the divided blocks is determined to be a scene boundary if it has a deep height within the same short distance that occurs at the same time (ie, the same starting frame ID). These linear segments representing scene boundaries are ignored in the process of selecting the main ascending part.

１２では、図５に示されるように、成功するアラインメントに繋がりそうな、連続するマッチした上昇部／下降部を有する近似的アラインメントを検出するために、クエリ映像と目標映像の主要上昇部／下降部が比較される。図６を参照すると、Ｍ１×Ｍ２のマトリックスが発生され、ただしＭ１およびＭ２は比較を受ける主要上昇部／下降部シーケンスの長さである。ｉおよびｊでの２つの主要上昇部／下降部がマッチする場合は、値「１」がマトリックス（ｉ，ｊ）に置かれる。線形区分Ｓ_１［ｉ_１，…，ｊ_１］と区分Ｓ_２［ｉ_２，…，ｊ_２］の間の類似性をチェックするために、区分の高さと長さだけでなく、２つの区分に含まれた映像フレームの類似性も考慮する。より正確には、以下の場合には、これらの２つの区分は類似である： 12, the primary ascending / descending of the query video and the target video to detect an approximate alignment with successive matched ascending / descending parts that are likely to lead to a successful alignment, as shown in FIG. Parts are compared. Referring to FIG. 6, an M1 × M2 matrix is generated, where M1 and M2 are the lengths of the main ascending / descending sequences that are compared. If the two major ascent / descent parts at i and j match, the value “1” is placed in the matrix (i, j). To check the similarity between the linear segment S ₁ [i ₁ ,..., J ₁ ] and the segment S ₂ [i ₂ ,..., J ₂ ], not only the segment height and length, but also the two segments Also consider the similarity of the video frames included in. More precisely, these two categories are similar if:

すなわち２つの区分は同様な長さである。この実装形態では、ｒａｔｉｏ_Ｌ＝０．９である。

That is, the two sections are similar in length. In this implementation, ratio _L = 0.9.

すなわち２つの区分は同様な長さである。この実装形態では、ｒａｔｉｏ_Ｈ＝０．７５である。

That is, the two sections are similar in length. In this implementation, ratio _H = 0.75.

ｍｉｎ_ｐＤ（ｐ）≦ｄｉｓｔ言い換えれば、２つの対応するフレームシーケンス間の最小距離は、より短いシーケンスをより長いシーケンスに沿って「スライド」させたときに最大でも閾値定数ｄｉｓｔであり、ただしｐは長い方の映像内でのスライドするフレーム位置の開始点にわたる範囲である。この実施形態ではその効率性と精度により、映像類似性距離を計算するために空間的および時間的な順序識別特性アルゴリズムを選択する。 min _p D (p) ≦ dist In other words, the minimum distance between two corresponding frame sequences is at most the threshold constant dist when “sliding” a shorter sequence along a longer sequence, where p Is the range over the starting point of the sliding frame position in the longer image. In this embodiment, due to its efficiency and accuracy, a spatial and temporal order identification characteristic algorithm is selected to calculate the video similarity distance.

２つのフレームシーケンスをＦ_１およびＦ_２として、順序識別特性測定値は以下の２つのフレームシーケンスＦ_１とＦ_２の間の距離を計算する： Given the two frame sequences F ₁ and F ₂ , the sequence identification characteristic measurement calculates the distance between the following two frame sequences F ₁ and F ₂ :

ただし、Ｌ＝ｊ_１−ｉ_１は、短い方のシーケンスの長さである。

However, L = j ₁ -i ₁ is the length of the shorter sequence.

ユーザ修正および映像処理技法は、ヒストグラム等化、フレームサイズ変更またはクロッピング、輝度／色／色相の変更、他の付加されたノイズなどの映像明度値に差違を引き起こし得るので、同様な明度の線形区分の高さは異なり得る。同様な線形区分の距離はまた、線形区分近似誤差、またはユーザによって導入された他のノイズにより異なり得る。パラメータｒａｔｉｏ_Ｈおよびｒａｔｉｏ_Ｌを用いることにより、これらのノイズに対してある程度の許容が可能になる。ここでは２つのフレームシーケンスの距離を計算するために順序識別特性をベースとする測定値Ｄ（ｐ）が用いられたが、映像フレームのマッチングは、シーケンスマッチングまたはキーフレームをベースとするマッチングアルゴリズムを用いた、他のグローバル記述子さらにはローカル記述子に基づくものとすることができる。 User correction and video processing techniques can cause differences in video brightness values such as histogram equalization, frame resizing or cropping, brightness / color / hue changes, other added noise, etc. The height of can vary. Similar linear segment distances may also vary due to linear segment approximation errors or other noise introduced by the user. The use of the parameters ratio _H and ratio _L allows some tolerance for these noises. Here, the measured value D (p) based on the order identification characteristic is used to calculate the distance between the two frame sequences, but the matching of the video frame is performed using a matching algorithm based on sequence matching or key frame. It can be based on other global descriptors or even local descriptors used.

主要上昇部をアラインメントした後に図７に示されるように、さらにアラインメントした線形区分を検出するために潜在的な主要上昇部アラインメントは、隣接の主要でない上昇部に拡張される。このステップは、次の段階でＳｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムを適用するのに必要な比較の数を低減するために、不要なアラインメントを除去する。 As shown in FIG. 7 after aligning the main ascent, the potential main ascent alignment is expanded to the adjacent non-major ascent to detect further aligned linear segments. This step removes unnecessary alignments in order to reduce the number of comparisons required to apply the Smith-Waterman algorithm in the next stage.

次のステップでは重要な近似的アラインメントを検出するために、発明者らは、アラインメントは、類似したＤＮＡおよびタンパク質配列の検出に用いられる高速検索アルゴリズムであるＦＡＳＴＡによってもたらされるものと同様の手法を用いて実行できることを認識した。図８（ａ）に示されるように、マトリックス内の連続する値「１」のすべての対角線が識別される。次に図８（ｂ）に示されるように、長さが予め規定された閾値より長い対角線が保持され、単一のマッチおよび短いアラインメントした区分は無視される。次いで図８（ｃ）に示されるように上位Ｋ個の最長対角線が選択される。アラインメントの全体の長さを延長するために、より長い区分を形成するように互いに近い上位Ｋ個の対角線のそれらの区分同士を結合することが試みられる。結合された、より長い区分内には、フレーム挿入、削除、および置換を考慮に入れるためにギャップが許容される。 To detect important approximate alignments in the next step, we use an approach similar to that provided by FASTA, a fast search algorithm used to detect similar DNA and protein sequences. Recognized that it can be executed. As shown in FIG. 8 (a), all diagonal lines of consecutive values “1” in the matrix are identified. Next, as shown in FIG. 8 (b), diagonals whose length is longer than a predefined threshold are retained and single matches and short aligned segments are ignored. Next, as shown in FIG. 8C, the top K longest diagonal lines are selected. In order to extend the overall length of the alignment, an attempt is made to join those sections of the top K diagonals close to each other to form longer sections. Within the combined longer section, gaps are allowed to allow for frame insertion, deletion, and replacement.

隣接する対角線を接続するときに、対角線のマッチするラインには加点スコアが割り当てられ、ギャップすなわちミスマッチには減点スコアが与えられる。接続された対角線のそれぞれの加点スコアを加算し、ギャップ減点を減算することによってスコアが得られる。図８（ｄ）に示されるように、連結された近似的アラインメントのスコアが所与の閾値を超える場合は、連結された区分の周りの、前に無視された初期の短いアラインメントした区分を結合して、ギャップを有する近似的アラインメントを形成できるかどうかを判定するためにチェックが行われる。最後に、閾値を超える最終スコアを有する局所的な近似的アラインメントが、さらなる審査のために選択される。 When adjacent diagonals are connected, points with matching diagonals are assigned a scoring score, and gaps or mismatches are given a deduction score. A score is obtained by adding the score added to each of the connected diagonals and subtracting the gap deduction. As shown in Figure 8 (d), if the concatenated approximate alignment score exceeds a given threshold, then join the previously ignored initial short aligned segment around the concatenated segment A check is then made to determine if an approximate alignment with a gap can be formed. Finally, a local approximate alignment with a final score that exceeds the threshold is selected for further review.

１５での次の段階は、Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムを適用することにより、比較される映像のすべての明度線形区分のきめの細かいアラインメントが行われる。前に検出された主要上昇部／下降部の近似的アラインメントに基づいて、成功するアラインメントに繋がり得る線形明度区分のリストを確定することができる。Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムは、線形区分の限られた範囲を調べるだけでよい。 The next step at 15 is a fine-grained alignment of all lightness linear segments of the compared video by applying the Smith-Waterman algorithm. A list of linear lightness segments that can lead to a successful alignment can be determined based on the previously detected approximate alignment of the major ascending / descending parts. The Smith-Waterman algorithm need only look at a limited range of linear segments.

Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムは、最適アラインメントを検出するために編集距離を用いる。これは以下のようにスコア付けマトリックスＨを構築する：
Ｈ（ｉ，０）＝０、０≦ｉ≦Ｍ
Ｈ（０，ｊ）＝０、０≦ｊ≦Ｎ The Smith-Waterman algorithm uses edit distances to detect optimal alignment. This builds a scoring matrix H as follows:
H (i, 0) = 0, 0 ≦ i ≦ M
H (0, j) = 0, 0 ≦ j ≦ N

０≦ｉ≦Ｍ、０≦ｊ≦Ｎ
ただしｘおよびｙは、アラインメントする可能性のある線形区分のリスト、ＭおよびＮはｘおよびｙシーケンスの長さ、ω（ｘ_ｉ，ｙ_ｊ）はスコア付けスキームである。ｘ_ｉとｙ_ｊがマッチする場合はω（ｘ_ｉ，ｙ_ｊ）は正であり、それらがマッチしない場合は負である。挿入および削除に対しては、ω（ｘ_ｉ，−）およびω（−，ｙ_ｊ）は負である。

0 ≦ i ≦ M, 0 ≦ j ≦ N
Where x and y are lists of linear segments that may be aligned, M and N are the lengths of the x and y sequences, and ω (x _i , y _j ) is the scoring scheme. When x _i and y _j match, ω (x _i , y _j ) is positive, and when they do not match, it is negative. For insertions and deletions, ω (x _i , −) and ω (−, y _j ) are negative.

Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムは、マトリックスＨ内の極大スコアを検索することによって局所的アラインメントを検出し、次いでマトリックスを構築するために用いられる動きの方向に応じて最適経路を遡る。これは０のスコアに達するまでこの処理を維持する。局所的最適アラインメントが得られた後に１６では、マッチする線形区分を求める既存のシーケンスマッチング技法を適用することによって映像類似性距離が計算される。この実施形態では、映像類似性距離を求めるために２×２分割を有する順序尺度が用いられる。１７で距離が閾値より小さいことが分かった場合は、２つの比較される映像は複製物であると見なされる。 The Smith-Waterman algorithm detects local alignments by searching for local maxima in the matrix H and then follows the optimal path depending on the direction of motion used to construct the matrix. This keeps this process until a score of 0 is reached. At 16 after the local optimal alignment is obtained, the video similarity distance is calculated by applying existing sequence matching techniques to find a matching linear segment. In this embodiment, an order scale having a 2 × 2 split is used to determine the video similarity distance. If it is found at 17 that the distance is less than the threshold, the two compared images are considered to be duplicates.

次に１８では、線形区分に対して、線形区分レベルの代わりに映像フレームレベルでのアラインメントが調べられる。最適の局所的アラインメントは明度線形区分に基づくので、区分内でフレーム変化が生じる場合は、上述のようにＳｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムを用いて区分全体がマッチでないと見なされる。マッチしない区分内で潜在的なマッチング位置を検出するために、フレームレベルの類似性距離を計算するためにフレーム対フレーム比較が行われる。フレーム類似性距離が、Ｓｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムを用いて得られる映像類似性距離より小さい場合は、それらのフレームはマッチすると見なされる。これは、それらのマッチしない区分内のマッチするフレームの類似性距離が残りのマッチした区分から得られる平均の映像類似性距離を超えないことを確実にする。フレーム比較は、区分の中間に向かって、マッチしない区分の始まりおよび終わりの両方から開始される。マッチングは、フレーム類似性距離が映像類似性距離より大きくなるまで続けられる。次いで映像重複位置が更新される。 Next, at 18, the alignment at the video frame level is examined for the linear segment instead of the linear segment level. Since the optimal local alignment is based on a lightness linear partition, if a frame change occurs within a partition, the entire partition is considered not a match using the Smith-Waterman algorithm as described above. In order to detect potential matching positions within the unmatched section, a frame-to-frame comparison is performed to calculate a frame-level similarity distance. If the frame similarity distance is less than the video similarity distance obtained using the Smith-Waterman algorithm, the frames are considered to match. This ensures that the similarity distance of matching frames in those unmatched segments does not exceed the average video similarity distance obtained from the remaining matched segments. The frame comparison starts at both the beginning and end of the unmatched section toward the middle of the section. Matching is continued until the frame similarity distance is greater than the video similarity distance. Next, the video overlap position is updated.

したがってこの実施形態では、分割されたブロックの明度値は初めに時系列と見なされる。次いで時系列は離散的な線形表示のリストに区分化される。最適マッチング位置を検出するためにそれらの線形区分の局所的シーケンスアラインメントが行われる。次いで潜在的アラインメント位置に基づいて、映像類似性距離が計算される。最良マッチング類似性距離が所与の閾値より小さい場合は、２つの映像は複製物であると見なされる。フレームの変化を取り扱うために、線形シーケンス区分の比較時にはギャップ、フレーム挿入、削除、および置換の結果が存在することは許容される。 Therefore, in this embodiment, the brightness values of the divided blocks are initially considered as time series. The time series is then partitioned into a list of discrete linear representations. A local sequence alignment of these linear segments is performed to find the optimal matching position. A video similarity distance is then calculated based on the potential alignment position. If the best matching similarity distance is less than a given threshold, the two videos are considered duplicates. To handle frame changes, the existence of gaps, frame insertions, deletions, and substitutions is allowed when comparing linear sequence partitions.

図９を参照すると映像管理装置は、映像ファイルを保持するデータベースまたは記憶装置１９を含む。データベース１９は、インターネットを通じてユーザがアクセス可能なもの、またはたとえばアクセスが制限されたライブラリまたは他の保管場所とすることができる。これらの可能なものの代わりにまたはそれらに加えて、他のタイプの記憶装置またはデータベースを用いることができる。 Referring to FIG. 9, the video management apparatus includes a database or storage device 19 that holds video files. Database 19 may be accessible to the user through the Internet, or may be, for example, a library or other storage location with limited access. Other types of storage devices or databases can be used in place of or in addition to these possibilities.

ユーザは、ユーザインターフェース２０を通じて映像Ｑを提出することによって、ユーザがデータベース１９に追加したい映像Ｑを送信する。映像Ｑは、映像データベース１９およびまた分割器２１に送られる。動作の段階１では、分割器２１は映像Ｑの各フレームをＮ１×Ｎ２ブロックに分割する。計算器２２はブロックのそれぞれに対して平均明度値を計算する。 The user submits the video Q through the user interface 20 to transmit the video Q that the user wants to add to the database 19. The video Q is sent to the video database 19 and also to the divider 21. In stage 1 of operation, the divider 21 divides each frame of the video Q into N1 × N2 blocks. Calculator 22 calculates an average brightness value for each of the blocks.

段階２では平均明度値データが計算器２２から区分化器２３によって受け取られる。区分化器２３は、各ブロックの平均明度の変化を区分化する。ソータ２４は次いで区分開始フレームＩＤに基づいてすべてのブロックからの線形区分をソートして、ソートされたリストにする。選択器２５は、ソートされたリストを受け取り、ソートされたリストから主要上昇部／主要下降部を選択する。 In stage 2, average brightness value data is received from the calculator 22 by the segmenter 23. The segmenter 23 segments the change in average brightness of each block. The sorter 24 then sorts the linear segments from all blocks based on the segment start frame ID into a sorted list. The selector 25 receives the sorted list and selects a major ascending / major descending from the sorted list.

次の段階の段階３では、アライナ２６は、クエリ映像の選択された主要上昇部および主要下降部と、同様な処理を受けた１つまたは複数の目標映像のそれらとの間で、近似的マッチを検出することを試みる。結果は第１の比較器２７によってテストされる。所与の閾値パラメータに対して判断されて類似性がなかった場合は、クエリ映像と１つまたは複数の目標映像は複製物ではないと見なされ、複製検出プロセスは２８で終了する。 In the next stage, stage 3, aligner 26 performs an approximate match between the selected primary ascending and descending parts of the query video and those of one or more target videos that have undergone similar processing. Try to detect. The result is tested by the first comparator 27. If there is no similarity as determined for a given threshold parameter, the query video and one or more target videos are considered not duplicates and the duplicate detection process ends at 28.

比較器２７が近似的アラインメントを検出した場合は、段階４でバンド化されたＳｍｉｔｈ−Ｗａｔｅｒｍａｎアルゴリズムがプロセッサ２９によって適用され、結果は同様な類似性距離計算器３０に加えられる。類似性距離計算器３０の出力は、第２の比較器３１によって所与の閾値に対してチェックされる。類似性が不十分である場合は、比較された映像は複製物ではないと見なされ、プロセスは３２で終了する。 If the comparator 27 detects an approximate alignment, the Smith-Waterman algorithm banded in step 4 is applied by the processor 29 and the result is applied to a similar similarity distance calculator 30. The output of the similarity distance calculator 30 is checked against a given threshold by the second comparator 31. If the similarity is insufficient, the compared video is considered not a replica and the process ends at 32.

十分な類似性がある場合は段階５で、フレームマッチャー３３は、映像挿入、削除、または置換に対するマッチしないフレームの位置をチェックする。 If there is sufficient similarity, at step 5, the frame matcher 33 checks the position of the unmatched frame for video insertion, deletion or replacement.

複製物検出プロセスの結果は、記憶される映像の管理に用いるために映像データベース１９に送られる。クエリ映像が複製物ではないことが分かった場合は、映像データベース１９はそれを記憶するために受け入れる。クエリ映像が複製物であることが分かった場合は、一実施形態では次いで映像データベース１９は、ユーザにそれを通知するためのメッセージを伴ってまたは伴わずに、クエリ映像を拒絶する。 The result of the duplicate detection process is sent to the video database 19 for use in managing the stored video. If the query video is found not to be a replica, the video database 19 accepts it for storage. If the query video is found to be a duplicate, in one embodiment, the video database 19 then rejects the query video with or without a message to inform the user.

代替実施形態または代替形態では、クエリ映像が複製物であることが分かった場合は、映像データベース１９に受け入れられるが、好ましくはそれがマッチした目標映像への参照を有して、複製物として表示される。複製物映像はグループに一緒に集めることができる。データベース上で行われる検索がグループの１つを呼び出したときは、他のグループ要素は検索結果から削除することができ、またはいずれの複製物も他の非複製物の後に提示される傾向をもつように、検索結果においてそうでない場合に受け得るよりも低いランキングが与えられる。 In an alternative embodiment or alternative, if the query video is found to be a duplicate, it is accepted into the video database 19, but preferably it has a reference to the matched target video and is displayed as a replica. Is done. Duplicate images can be collected together in a group. When a search performed on the database calls one of the groups, other group elements can be removed from the search results, or any replicas tend to be presented after other non-replicas As such, the search results are given a lower ranking than would otherwise be received.

図９の映像管理装置は、クエリ映像が提出される前に、映像データベース１９内に保持される映像が分割され、２１および２２で分割され処理されるように変更することができる。たとえば一実施形態では、複製物について調べるように映像が提出されたときに得られたデータは保持され、映像データベース１９に送って記憶することができる。その映像がその後にデータベース１９に受け入れられなかった場合はそのデータは削除される。映像がデータベースに受け入れられたときは、それに関連付けられたデータは保持され、アライナ２６での使用に利用可能となる。別の実施形態では映像データベース１９内の映像は、必ずしも複製物のテストに用いられていなくても、段階１および段階２で分割および処理することができる。たとえばデータ処理は、新しい映像を受け取るためにデータベースを開放する前の準備段階の一部として実行することができる。 The video management apparatus of FIG. 9 can be changed so that the video held in the video database 19 is divided and divided and processed at 21 and 22 before the query video is submitted. For example, in one embodiment, the data obtained when a video is submitted to be examined for a replica can be retained and sent to the video database 19 for storage. If the video is not subsequently accepted by the database 19, the data is deleted. When the video is accepted by the database, the data associated with it is retained and made available for use by the aligner 26. In another embodiment, the video in the video database 19 can be split and processed in stage 1 and stage 2 even though not necessarily used for replica testing. For example, data processing can be performed as part of a preparatory step before opening the database to receive new video.

「プロセッサ」として名前が付けられたいずれの機能ブロックを含む、図に示された様々な要素の機能は、専用のハードウェア、ならびに適当なソフトウェアに関連してソフトウェアを実行することができるハードウェアを用いることによって実現することができる。プロセッサによって実現されるとき、これらの機能は、単一の専用プロセッサ、単一の共有されたプロセッサ、その一部を共有することができる複数の個別のプロセッサによって実現することができる。さらに「プロセッサ」という用語の明示的な使用は、ソフトウェアを実行することができるハードウェアを排他的に指すと解釈されるべきではなく、非限定的に、デジタル信号プロセッサ（ＤＳＰ）ハードウェア、ネットワークプロセッサ、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、ソフトウェアを記憶するためのリードオンリメモリ（ＲＯＭ）、ランダムアクセスメモリ（ＲＡＭ）、および不揮発性記憶装置を暗黙に含むことができる。従来型および／またはカスタムの他のハードウェアも含むことができる。 The functions of the various elements shown in the figure, including any functional blocks named as “processors”, are dedicated hardware as well as hardware capable of executing software in conjunction with appropriate software. It can be realized by using. When implemented by a processor, these functions can be implemented by a single dedicated processor, a single shared processor, or multiple individual processors that can share a portion thereof. Furthermore, the explicit use of the term “processor” should not be construed to refer exclusively to hardware capable of executing software, but is not limited to digital signal processor (DSP) hardware, network Implicitly includes a processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage it can. Other conventional and / or custom hardware can also be included.

本発明は、その趣旨または本質的な特徴から逸脱せずに、他の特定の形で実施することができる。説明した実施形態は、すべての点で例示のみであり、限定的と見なされるべきではない。したがって本発明の範囲は上記の説明によってではなく、添付の特許請求の範囲によって示される。特許請求の範囲と均等な意味および範囲内に含まれるすべての変更形態は、それらの範囲に包含されるものとする。 The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are exemplary only in all respects and should not be regarded as limiting. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method for comparing a query video with a target video in a video management device ,
By the video management device,
Dividing the query video frame and the target video frame into a plurality of blocks;
Calculating an average brightness value for each block;
Generating a plurality of query time series for the query video, each query time series representing a temporal variation in average brightness values for blocks from the same location in different frames of the query video; and
Generating a plurality of target time series for the target video, each target time series representing a temporal variation in average brightness values for blocks from the same location in different frames of the target video; and
Determining whether there is an alignment between the query video and the target video using the query time series and the target time series ;
Partitioning the query time series and the target time series into a set of discrete linear sections each;
Performing local sequence alignment of the linear segments .

The method of claim 1 , comprising: selecting a major ascending portion and a major descending portion from the segmented time series; and using the major ascending portion and the major descending portion for performing the alignment.

The method of claim 2 , wherein the selected major ascending and major descending portions exclude jump ascending and jump descending portions.

The main ascender and descender of query image, as compared with the main rise portion and the main descending portion of the target image, comprising the steps of obtaining an approximate alignment with elevated portions and descending portions that match consecutive to claim 1 The method described.

5. The method of claim 4 , comprising matching a primary ascending / primary descending query video sequence with a primary ascending / primary descending target video sequence.

The matching step generates a matrix with cells plotting the primary ascending / primary descending query video sequence against the primary ascending / primary descending target video sequence, and if there is a match, an appropriate matrix The method of claim 5 , wherein the method is performed by adding a marker in a cell.

7. The method of claim 6 including the step of extending the primary rise / primary descent to the adjacent non-major rise / non-major fall after aligning the major rise / major fall.

8. The method of claim 7 , comprising identifying a diagonal of successive cells having markers and maintaining a diagonal having a length greater than a given threshold for additional alignment processing.

Selecting a K-number of the longest diagonal line, and a step of attempting to combine the divided each other closely spaced included in top K diagonal to form a longer segment, to claim 8 The method described.

Give a score that is added to diagonally matching lines and a penalty score for gaps in longer lines, and if the combined score of the connected approximate alignments exceeds a given score threshold, the connected Checking whether previously ignored initial short aligned segments around the defined segments can be combined to form an approximate alignment, and final exceeding the final score threshold for further review and selecting a local approximate alignment with scores method of claim 9.

3. The method of claim 2 , comprising obtaining an approximate alignment of the sections and selecting a set of likely alignments, and then applying a Smith-Waterman algorithm to the selected set.

The method of claim 11 , comprising aligning at approximately the frame level to approximately aligned segments that are not included in the selected set.

The method of claim 1, comprising storing the query video in a video database that holds the target video when it is determined that the query video is not a replica of the target video.

Dividing the query video frame and the target video frame into a plurality of blocks;
Calculating an average brightness value for each block;
Generating a plurality of query time series for the query video, each query time series representing a temporal variation in average brightness values for blocks from the same location in different frames of the query video; and
Generating a plurality of target time series for the target video, each target time series representing a temporal variation in average brightness values for blocks from the same location in different frames of the target video; and
Determining whether there is an alignment between the query video and the target video using the query time series and the target time series ;
Partitioning the query time series and the target time series into a set of discrete linear sections each;
Performing a local sequence alignment of the linear segments and an apparatus programmed or configured to perform a method.

15. The apparatus of claim 14 , programmed or configured to select a main ascending and a main descending from the segmented time series, and using the main ascending and the main descending for performing the alignment.

The apparatus of claim 15 , wherein the selected major rise and fall are excluded jump segments.

Programmed or configured to compare the primary ascending and descending parts of the query video with the main ascending and descending parts of the target video to obtain an approximate alignment with consecutive matched ascending and descending parts The apparatus according to claim 14 .

18. The apparatus of claim 17 , programmed or configured to match a primary ascending / primary descending query video sequence with a primary ascending / primary descending target video sequence.

Match by generating a matrix with cells that plot the primary ascending / primary descending query video sequence against the primary ascending / primary descending target video sequence, and if there is a match, The apparatus of claim 18 programmed or configured to add a marker in a cell.

20. The apparatus of claim 19 , programmed or configured to extend a primary ascension / primary descent to an adjacent non-major ascension / non-primary descent after aligning a primary ascension / primary descent. .

21. The apparatus of claim 20 , programmed or configured to identify a diagonal of successive cells having markers and to maintain a diagonal having a length greater than a given threshold for additional alignment processing.

Claims programmed or configured to select the K longest diagonals and attempt to combine adjacently located segments included in the top K diagonals to form longer segments The apparatus according to 21 .

Diagonal matching lines are given scoring scores, gaps in longer lines are given deduction scores, and concatenated approximate alignments combined when the combined score exceeds a given score threshold Check whether the earlier short aligned segments that were previously ignored around the segment can be combined to form an approximate alignment and have a final score that exceeds the final score threshold for further review 23. The apparatus of claim 22 , programmed or configured to select a general approximate alignment.

To obtain the approximate alignment of segment select alignments may be successful in one set, and then is programmed or configured to apply the Smith-Waterman algorithm to the selected set of claim 14 apparatus.

25. The apparatus of claim 24 , programmed or configured to perform alignment at a frame level for approximately aligned segments that are not included in a selected set.

15. The apparatus of claim 14 , programmed or configured to store the query video in a video database that holds the target video when it is determined that the query video is not a replica of the target video.

Dividing the query video frame and the target video frame into a plurality of blocks;
Calculating an average brightness value for each block;
Generating a plurality of query time series for the query video, each query time series representing a temporal variation in average brightness values for blocks from the same location in different frames of the query video; and
Generating a plurality of target time series for the target video, each target time series representing a temporal variation in average brightness values for blocks from the same location in different frames of the target video; and
Determining whether there is an alignment between the query video and the target video using the query time series and the target time series ;
Partitioning the query time series and the target time series into a set of discrete linear sections each;
A data storage medium storing a machine-executable program for performing a method of managing video content, comprising: performing local sequence alignment of those linear segments .