JP2020095707A

JP2020095707A - Alignment-free video change detection using deep blind image region prediction

Info

Publication number: JP2020095707A
Application number: JP2019211687A
Authority: JP
Inventors: リチャードタイラージェフリー; Richard Taylor Geoffrey
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-12-11
Filing date: 2019-11-22
Publication date: 2020-06-18
Anticipated expiration: 2039-11-22
Also published as: JP6967056B2; US10915755B2; US20200184224A1

Abstract

To provide a method of detecting a change in a scene between images captured at different times by a moving camera.SOLUTION: A method 400 of detecting a change in a scene between images captured at different times by a moving camera comprises generating 450 an image corresponding to a query image captured by the moving camera using a reconstruction model based on reference images captured by the moving camera, and detecting 460 a change in the scene by comparing the query image with the generated image.SELECTED DRAWING: Figure 4

Description

本明細書は一般に画像処理に関し、特に、それぞれの移動カメラによって取り込まれたシーンの画像シーケンスを比較することによってシーンの変化を検出することに関する。また、本明細書は、移動カメラから取り込まれたシーンの画像シーケンスを比較することによってシーンの変化を検出するためのコンピュータプログラムを記録したコンピュータ可読媒体を含むコンピュータプログラム製品に関する。 The present specification relates generally to image processing, and more particularly to detecting scene changes by comparing image sequences of the scene captured by each moving camera. The present specification also relates to a computer program product including a computer readable medium having a computer program recorded thereon for detecting a change in a scene by comparing image sequences of the scene captured from a mobile camera.

ショッピングセンター、駐車場、鉄道の駅などの公共の場は、ビデオカメラの大規模ネットワークを用いた監視をますます受けている。ビデオ監視のアプリケーション領域は、セキュリティ、安全性、交通管理、およびビジネス解析を含む。カメラネットワークによって生成される大量のビデオは、関心のあるオブジェクトおよびイベントを識別し、それらをユーザの注意に向けるために、「映像解析」として知られる自動化された方法を必要とする。「変化検出」として知られる映像解析における基本的なタスクは、シーンのどの部分が時間とともに変化したかを判定することである。追跡及び分類のような追加の解析は、典型的には変化したと判定されたシーンの部分に適用される。 Public places such as shopping centers, parking lots, and railway stations are increasingly being monitored using large networks of video cameras. Video surveillance application areas include security, safety, traffic management, and business analytics. The large amount of video produced by camera networks requires an automated method known as "video analysis" to identify objects and events of interest and direct them to the user's attention. A basic task in video analysis, known as "change detection", is to determine which part of a scene has changed over time. Additional analysis, such as tracking and classification, is typically applied to the portion of the scene that is determined to have changed.

固定カメラにおける変化検出のための方法が利用可能である。「背景差分」と呼ばれる１つの方法では、固定カメラによって見られるシーンの背景モデルがガウスモデルの混合に従って各画像位置における画素値の分布を推定することによって形成される。テストフレームが与えられると、各画素は画素値が背景モデルに従って高い尤度を有する場合に「背景」として分類され、そうではない場合には画素は「前景」としてラベル付けされる。前景画素は、移動物体を検出するために連結領域にクラスタ化される。この方法の欠点は、カメラが固定されたままでなければならないことであり、さもなければ、異なる画素位置における分布が一緒に混合され、背景から前景を分離するために使用され得ない。 Methods for change detection in fixed cameras are available. In one method, called "background subtraction", a background model of the scene seen by a fixed camera is formed by estimating the distribution of pixel values at each image location according to a mixture of Gaussian models. Given a test frame, each pixel is classified as "background" if the pixel value has a high likelihood according to the background model, otherwise the pixel is labeled as "foreground". Foreground pixels are clustered into connected areas to detect moving objects. The drawback of this method is that the camera must remain fixed, otherwise the distributions at different pixel locations are mixed together and cannot be used to separate the foreground from the background.

広い地理的領域にわたるビデオ監視のために、固定されたビデオカメラのネットワークは、特に電力およびネットワークインフラストラクチャが制限されている遠隔地では実用的ではない。遠隔ビデオ監視のアプリケーションは、農業監視、重要なインフラストラクチャ監視、国境保護、探索および救助を含む。１つの方法では、監視カメラが固定経路に沿って鉄道軌道の周期的な巡回を行うエアボーン無人機に取り付けられる。上述したように、従来の背景差分方法は、カメラが絶えず動いているので、このシナリオでシーン変化を検出するために使用することができない。 Due to video surveillance over large geographical areas, fixed video camera networks are not practical, especially in remote areas where power and network infrastructure are limited. Applications of remote video surveillance include agricultural surveillance, critical infrastructure surveillance, border protection, exploration and rescue. In one method, surveillance cameras are attached to an airborne drone, which makes periodic patrols of railroad tracks along a fixed path. As mentioned above, conventional background subtraction methods cannot be used to detect scene changes in this scenario because the camera is constantly moving.

移動カメラからの変化検出のための１つの方法は、例えば、同じ固定ルートに沿った無人機の異なるパトロール中に、異なる時間に移動カメラによって取り込まれたシーンの２つの画像シーケンスを比較することである。変化を検出するためにシーンの２つの画像シーケンスを比較する問題は、本開示全体を通して「ビデオ変化検出」と呼ばれる。さらに、第１の画像シーケンスは「参照」シーケンスと呼ばれ、第２の画像シーケンスは本開示全体を通して「クエリ」シーケンスと呼ばれる。 One method for detecting changes from a moving camera is, for example, by comparing two image sequences of a scene captured by the moving camera at different times during different patrols of the drone along the same fixed route. is there. The problem of comparing two image sequences of a scene to detect changes is referred to as "video change detection" throughout this disclosure. Further, the first image sequence is referred to as the "reference" sequence and the second image sequence is referred to as the "query" sequence throughout this disclosure.

ビデオ変化の検出は、いくつかの理由で難しい。第１に、移動カメラからの各フレームはシーン全体の異なる部分を撮像しており、その結果、参照シーケンスからの所与のフレームは、クエリシーケンスからの所与のフレームと同じシーンの部分に対応しないことがある。第２に、カメラの移動軌跡、ひいてはカメラの視点は定位誤差および環境条件のために、２つのビデオの間で変化することがある。視点の変化は、「視差誤差」として知られ、シーンの変化として誤って検出される可能性がある、シーンの一部を他の部分に対してシフトさせる。最後に、参照シーケンスおよびクエリシーケンスは異なる環境条件で撮像され、観察される明るさ、影、および反射の変化を生成し、これらの変化はまた、シーン変化として誤って検出される可能性がある。 Video change detection is difficult for several reasons. First, each frame from the moving camera is capturing a different portion of the entire scene so that a given frame from the reference sequence corresponds to the same portion of the scene as the given frame from the query sequence. There are times when you don't. Second, the trajectory of the camera, and thus the viewpoint of the camera, can change between the two videos due to localization errors and environmental conditions. A change in viewpoint, known as a “parallax error”, shifts part of the scene relative to other parts, which can be falsely detected as a change in the scene. Finally, the reference and query sequences are imaged in different environmental conditions, producing changes in observed brightness, shadows, and reflections, which can also be falsely detected as scene changes. ..

上記の課題は、参照シーケンスおよびクエリシーケンスの時間的および空間的位置合わせを決定することによって対処され得る。１つの方法では、時間的位置合わせは各フレームについて低次元特徴ベクトルを計算し、効率的な探索アルゴリズムを使用して特徴空間内の各クエリフレームについて最近傍参照フレームを見つけることによって決定される。一対の最近傍フレームが与えられると、画像パッチ間の局所ホモグラフィを推定することに基づいて画素対応を決定することによって、空間的位置合わせが実行される。最後に、位置合わせされた画素間の差を計算することによって、シーン変化が検出される。この方法の欠点は、シーンの変化を、参照ビデオとクエリビデオとの間の視点の変化によって引き起こされる視差誤差による変化と区別することができないことである。この方法の別の欠点は、時間的位置合わせの計算コストが参照シーケンスの長さに比例して増加することである。これは、境界保護またはガスパイプライン監視のような大規模な実用的なアプリケーションにとってかなりのコストになることがある。この方法のさらに別の欠点は、シーンの変化を、参照シーケンスおよびクエリシーケンスが異なる気象条件で撮像された場合に生じる照明、影、および反射による変化と区別することができないことである。 The above issues may be addressed by determining the temporal and spatial alignment of reference and query sequences. In one method, temporal alignment is determined by computing a low-dimensional feature vector for each frame and using an efficient search algorithm to find the nearest reference frame for each query frame in feature space. Given a pair of nearest neighbor frames, spatial registration is performed by determining pixel correspondences based on estimating the local homography between image patches. Finally, the scene change is detected by calculating the difference between the registered pixels. The drawback of this method is that scene changes cannot be distinguished from changes due to parallax error caused by viewpoint changes between the reference and query videos. Another drawback of this method is that the computational cost of temporal alignment increases proportionally with the length of the reference sequence. This can be a significant cost for large-scale practical applications such as perimeter protection or gas pipeline monitoring. Yet another drawback of this method is the inability to distinguish scene changes from illumination, shadow, and reflection changes that occur when reference and query sequences are imaged in different weather conditions.

別の方法では、時間的および空間的位置合わせは、参照ビデオおよびクエリビデオの両方において所定の関心領域（ＲＯＩ）を追跡し、特定の視点に対応するキーフレームを、グローバルポジショニングシステム（ＧＰＳ）ロケーションなどの視点識別子に関連付けることによって確立される。参照シーケンスからのキーフレームは、低次元固有空間を学習するために使用される。対応する視点識別子を有する参照シーケンスおよびクエリシーケンスからのキーフレームが固有空間に投影され、投影された点間の距離が計算される。距離が閾値よりも大きい場合、シーンは変化したと判定される。この方法の欠点は、この方法が実質的に同じ視点から撮像されるべき参照シーケンスおよびクエリシーケンスを必要とすることである。実際には、ＧＰＳのような定位方法は数メートルまでの誤差を有するので、これを達成することは困難であり得る。別の欠点は、比較が元の画像ではなく固有空間で実行され、ＲＯＩ内の特定の変化を局所化しないことである。さらに別の欠点は、参照シーケンスおよびクエリシーケンスの両方において、所定のＲＯＩが追跡されることを必要とすることである。ＲＯＩの外側の変化は検出されない。 Alternatively, the temporal and spatial alignment track a given region of interest (ROI) in both the reference and query videos and identify the keyframes corresponding to a particular viewpoint with the Global Positioning System (GPS) location. Established by associating with a viewpoint identifier such as. The keyframes from the reference sequence are used to learn the low dimensional eigenspace. Keyframes from the reference and query sequences with corresponding view identifiers are projected into eigenspace and the distance between the projected points is calculated. If the distance is greater than the threshold, the scene is determined to have changed. The drawback of this method is that it requires a reference sequence and a query sequence to be imaged from substantially the same viewpoint. In practice, this may be difficult to achieve as localization methods like GPS have errors of up to a few meters. Another drawback is that the comparison is performed in the eigenspace rather than the original image and does not localize certain changes in the ROI. Yet another drawback is that a given ROI needs to be tracked in both the reference and query sequences. No changes outside the ROI are detected.

本発明の目的は、既存の構成の１つまたは複数の欠点を実質的に克服するか、または少なくとも改善することである。 It is an object of the present invention to substantially overcome, or at least ameliorate, one or more drawbacks of existing configurations.

参照ビデオシーケンスについてトレーニングされた画像パッチ予測器を使用してクエリ画像を再構成し、再構成されたクエリ画像と元のクエリ画像とを比較することによってシーン変化を検出することによって、上記の問題に対処しようとする、アライメントフリービデオ変化検出（ＡＶＣＤ）構成と呼ばれる構成が開示される。 The above problem by reconstructing a query image using an image patch predictor trained on a reference video sequence and detecting scene changes by comparing the reconstructed query image with the original query image. A configuration called Alignment Free Video Change Detection (AVCD) configuration is disclosed which seeks to address

本開示の一態様によれば、異なる時間におけるシーンを撮像している画像間のシーンの変化を検出する方法であって、該画像は参照画像およびクエリ画像を含み、該方法は、再構成モデルを使用してクエリ画像を再構成することと、再構成モデルは参照画像に基づいており、クエリ画像と再構成されたクエリ画像とを比較することによってシーンの変化を検出することと、を含む方法が提供される。 According to one aspect of the disclosure, a method of detecting scene changes between images capturing a scene at different times, the images including a reference image and a query image, the method comprising: Reconstructing a query image using a. and a reconstruction model based on a reference image and detecting a scene change by comparing the query image with the reconstructed query image. A method is provided.

本発明の１つ以上の実施形態を、以下の図面を参照して説明する。
、図１Ａおよび図１Ｂは、ＡＶＣＤ構成が適用され得る、異なる時間にシーンの２つの画像シーケンスを撮像する移動カメラの一例を集合的に示す。、図２Ａおよび図２Ｂは、説明したＡＶＣＤ構成を実施することができる汎用コンピュータシステムの概略ブロック図である。、、、図３Ａ、図３Ｂ、図３Ｃ、および図３Ｄは、１つのＡＶＣＤ構成に従ってシーンの変化を検出する例を集合的に示す。図４は、１つのＡＶＣＤ構成によるシーンの変化を検出する方法を示す概略的なフロー図である。図５は、図４の方法で使用されるような参照画像のセットに基づいて再構成モデルをトレーニングするサブプロセスを示す概略フロー図である。図６は、図５のサブプロセスに従ってキーポイントの周りに画像パッチを形成する例を示す。図７は、図４の方法で使用されるようなトレーニングされた再構成モデルに基づいてクエリ画像を再構成するサブプロセスを示す概略フロー図である。図８は、図７のサブプロセスで使用されるような画素位置で予測画素値を選択する例を示す。図９は、図５および図７のサブプロセスで使用されるようなパッチを予測するサブプロセスを示す概略フロー図である。図１０は、図９の方法で使用されるような画像パッチの周りにドーナツ形状領域を形成する例を示す。 One or more embodiments of the present invention will be described with reference to the following drawings.
, 1A and 1B collectively show an example of a mobile camera that captures two image sequences of a scene at different times, where the AVCD configuration may be applied. , 2A and 2B are schematic block diagrams of a general-purpose computer system capable of implementing the described AVCD configuration. , , , 3A, 3B, 3C, and 3D collectively show an example of detecting scene changes according to one AVCD configuration. FIG. 4 is a schematic flow diagram showing a method for detecting a scene change according to one AVCD configuration. FIG. 5 is a schematic flow diagram showing a sub-process of training a reconstruction model based on a set of reference images as used in the method of FIG. FIG. 6 shows an example of forming an image patch around a keypoint according to the sub-process of FIG. FIG. 7 is a schematic flow diagram showing a sub-process of reconstructing a query image based on a trained reconstruction model as used in the method of FIG. FIG. 8 shows an example of selecting predicted pixel values at pixel locations as used in the sub-process of FIG. FIG. 9 is a schematic flow diagram showing a sub-process of predicting patches as used in the sub-processes of FIGS. 5 and 7. FIG. 10 shows an example of forming a donut shaped area around an image patch as used in the method of FIG.

添付の図面のうちの任意の１つまたは複数において、同じ参照番号を有するステップおよび／または特徴を参照する場合、これらのステップおよび／または特徴は本説明の目的のために、反対の意図が現れない限り、同じ機能または動作を有する。 When referring to steps and/or features having the same reference numeral in any one or more of the accompanying drawings, those steps and/or features are shown for the purpose of this description in opposite sense. Unless otherwise, they have the same function or behavior.

従来技術の構成に関する「背景」の節および上記の節に含まれる議論は、それぞれの公開および／または使用を通じて公知の知識を形成する文書または装置の議論に関することに留意されたい。そのような議論は、本発明者または特許出願人による表現として解釈されるべきではなく、そのような文書または装置はいかなる形でも、当技術分野における一般的な一般的知識の一部を形成する。 It should be noted that the "Background" section and the discussion contained in the section above relating to prior art arrangements relate to the discussion of documents or devices forming public knowledge through their respective publications and/or uses. Such discussions should not be construed as an expression by the inventor or patent applicant, and such documents or devices in any way form part of the general general knowledge in the art. ..

コンテキスト
図３の画像３１０のような画像は、視覚的要素から構成される。用語「画素」、「画素位置」、および「画像位置」は、本明細書全体を通して、撮像された画像内の視覚要素のうちの１つを指すために互換的に使用される。画像の各画素は、集合的に「画素値」と呼ばれる１つ以上の値によって記述され、画像内の撮像されたシーンの特性を特徴付ける。画素値は、単一の強度値（画素位置におけるシーンの輝度を特徴付ける）、値のトリプレット（画素位置におけるシーンの色を特徴付ける）等を含む。 Context An image, such as image 310 in FIG. 3, is composed of visual elements. The terms “pixel”, “pixel position”, and “image position” are used interchangeably throughout this specification to refer to one of the visual elements in a captured image. Each pixel of an image is described by one or more values collectively referred to as a "pixel value" and characterizes the characteristics of the captured scene within the image. Pixel values include single intensity values (characterizing the brightness of the scene at pixel locations), triplets of values (characterizing the color of the scene at pixel locations), etc.

図８のパッチ８２０のような画像内の「パッチ」、「画像パッチ」、または「領域」は、１つまたは複数の空間的に隣接する視覚要素の集合を指す。画像内の「キーポイント」は、明確な位置を有し、輝度変化または幾何学的変形などの局所摂動にもかかわらず高い再現性で検出することができる局所画像構造である。キーポイントの一例は「コーナー」であり、これは、複数の方向における画像勾配によって特徴付けられる局所画像構造である。キーポイントの別の例は「ブロブ」であり、これは、中央領域と周囲領域との間の高いコントラストによって特徴付けられる局所画像構造である。「バウンディングボックス」は、図３Ｂのバウンディングボックス３２２のような、画像内のパッチ、領域、キーポイント、またはオブジェクトを囲む直線境界を指す。「特徴」または「画像特徴」は、パッチ内の画素値から決定された導出値または導出値のセットを表す。特徴または画像特徴の例には、パッチ内の色値のヒストグラム、パッチ内の量子化画像勾配応答のヒストグラム、パッチに適用される人工ニューラルネットワークの特定の層における活性化のセットなどが含まれる。 A “patch”, “image patch”, or “region” in an image, such as patch 820 in FIG. 8, refers to a collection of one or more spatially adjacent visual elements. A "keypoint" in an image is a local image structure that has a well-defined position and can be detected with high reproducibility despite local perturbations such as intensity changes or geometric deformations. An example of a key point is a "corner", which is a local image structure characterized by image gradients in multiple directions. Another example of a keypoint is a "blob," which is a local image structure characterized by high contrast between the central and surrounding areas. “Bounding box” refers to a straight boundary that surrounds a patch, region, keypoint, or object in an image, such as bounding box 322 in FIG. 3B. A "feature" or "image feature" represents a derived value or set of derived values determined from pixel values in a patch. Examples of features or image features include a histogram of color values within a patch, a histogram of quantized image gradient responses within a patch, a set of activations at a particular layer of an artificial neural network applied to a patch, and so on.

本開示はシーンの変化を判定するために、異なる時間に撮像された参照画像シーケンスとクエリ画像シーケンスとを比較する方法を提供する。図１Ａおよび図１Ｂは、ＡＶＣＤ構成が適用され得る例示的なユースケースを示す。全体的な目標は、故障、環境上の脅威、セキュリティ上の脅威、またはその動作を妨害する可能性のある他の要因による変化について、ガスパイプライン１１０の近傍の領域を監視することである。この例では、クエリビデオおよび参照ビデオが、図１Ａに示すようにエアボーン無人機１３０に取り付けられたビデオカメラ１３５によって撮像される。一構成では、無人機１３０がＡＶＣＤ構成が適用され得るコンピュータシステムを搭載する。別の構成では、カメラ１３５によって撮像されたビデオが無線で転送されるか、またはＡＶＣＤ構成が適用され得るリモートコンピュータシステムにダウンロードされる。 The present disclosure provides a method of comparing a reference image sequence and a query image sequence captured at different times to determine a scene change. 1A and 1B show exemplary use cases in which an AVCD configuration may be applied. The overall goal is to monitor the area near the gas pipeline 110 for changes due to failures, environmental threats, security threats, or other factors that may interfere with its operation. In this example, the query video and the reference video are captured by a video camera 135 attached to the airborne drone 130 as shown in FIG. 1A. In one configuration, the drone 130 is equipped with a computer system to which the AVCD configuration may be applied. In another configuration, the video captured by the camera 135 is transferred wirelessly or downloaded to a remote computer system where the AVCD configuration may be applied.

参照ビデオを撮像するために、無人機１３０はパイプライン１１０の近くに配備され、パイプライン１１０の近くの領域１４０のビデオを記録しながら、所定のパス１２０に沿ってナビゲートする。その後、クエリビデオはパイプライン１１０の近傍の領域１４０のビデオを記録しながら、パス１２５（基準パス１２０に類似する）をたどるように無人機１３０を配備することによって撮像される。実際には、パス１２０およびパス１２５は、位置特定および飛行制御の不正確さのために、両方の配備中に無人機が同じウェイポイントを使用してナビゲートしても、同一である可能性は低い。したがって、参照ビデオおよびクエリビデオは、いくらか異なる視点からシーンを取り込む。 To capture the reference video, the drone 130 is deployed near the pipeline 110 and navigates along a predetermined path 120 while recording video in a region 140 near the pipeline 110. The query video is then captured by deploying the drone 130 to follow the path 125 (similar to the reference path 120) while recording the video in the area 140 near the pipeline 110. In practice, path 120 and path 125 may be the same even if the drone navigates using the same waypoint during both deployments due to inaccuracies in location and flight control. Is low. Therefore, the reference and query videos capture the scene from somewhat different perspectives.

図１Ｂに示す例では、車両１５０が、参照ビデオを撮像した後かつクエリビデオを撮像する前の時間中に、パイプライン１１０の近傍の領域１４０に進入する。コンピュータシステムは、参照ビデオとクエリビデオとを比較し、車両の存在によるシーンの変化を検出し、適切な応答をトリガする。適切な応答の例には、シーン変化を分類するためにクエリビデオおよび参照ビデオに追加の解析を適用するコンピュータシステム、検出された変化に関する情報をユーザに送信するコンピュータシステムなどが含まれる。 In the example shown in FIG. 1B, vehicle 150 enters region 140 near pipeline 110 during the time after capturing the reference video and before capturing the query video. The computer system compares the reference video with the query video, detects changes in the scene due to the presence of a vehicle, and triggers an appropriate response. Examples of suitable responses include computer systems that apply additional analysis to query and reference videos to classify scene changes, computer systems that send information about detected changes to a user, and the like.

この例示的なＡＶＣＤ構成は、ビデオ監視の分野内外の両方のアプリケーションの範囲に適用される。１つのアプリケーションでは、カメラがトラック又は列車のような地上車両に取り付けられ、道路又は列車線のような輸送インフラストラクチャの欠陥を監視するために使用される。別のアプリケーションでは、参照シーケンスがＣＴスキャンなどの医療撮像方法を使用して撮像された患者の健康な内部組織の画像を含み、クエリ画像は後に撮像された同じ内部組織を表す。ＡＶＣＤ構成は、健康上の危険性を示す組織の変化を検出するために適用される。さらに別のアプリケーションでは、参照シーケンスが製造プロセスの特定の時点中に撮像された集積回路の正しく製造されたバッチの画像を含む。ＡＶＣＤ構成は製造欠陥を検出するために、後のバッチのクエリ画像に適用される。 This exemplary AVCD configuration applies to a range of applications both inside and outside the field of video surveillance. In one application, cameras are mounted on ground vehicles such as trucks or trains and used to monitor defects in transportation infrastructure such as roads or train lines. In another application, the reference sequence includes an image of the patient's healthy internal tissue imaged using a medical imaging method such as a CT scan, and the query image represents the same internal tissue imaged later. The AVCD configuration is applied to detect tissue changes that indicate a health risk. In yet another application, the reference sequence comprises an image of a correctly manufactured batch of integrated circuits taken during a particular point in the manufacturing process. The AVCD construction is applied to the query images in later batches to detect manufacturing defects.

概要
参照画像シーケンスおよびクエリ画像を撮像するステップと、参照画像に基づいて再構成モデルをトレーニングするステップと、トレーニングされた再構成モデルを使用してクエリ画像を再構成するステップと、シーン変化を検出するためにクエリ画像と再構成されたクエリ画像とを比較するステップと、を含むＡＶＣＤ構成が開示される。開示されたＡＶＣＤ構成は、同一の視点から又は同一の環境条件の下でシーンを撮像するために、基準画像シーケンス及びクエリ画像を必要としない。さらに、開示されたＡＶＣＤ構成は、参照画像に対するクエリ画像の時間的または空間的な位置合わせを必要としない。これは、ＡＶＣＤ構成がクエリ画像と、既に互いに位置合わせされ且つ同じ視点および環境条件を有する再構成されたクエリ画像とを比較するためである。最後に、開示されたＡＶＣＤ構成は、参照シーケンスの長さとは無関係に、比較ステップおよび検出ステップのための固定された計算コストで実施することができる。 Overview Capturing a reference image sequence and a query image; training a reconstruction model based on the reference image; reconstructing a query image using the trained reconstruction model; detecting scene changes Comparing the query image with the reconstructed query image to do so. The disclosed AVCD configuration does not require reference image sequences and query images to image a scene from the same viewpoint or under the same environmental conditions. Moreover, the disclosed AVCD configuration does not require temporal or spatial alignment of the query image with respect to the reference image. This is because the AVCD configuration compares the query image with the reconstructed query image that is already aligned with each other and has the same viewpoint and environmental conditions. Finally, the disclosed AVCD construction can be implemented with fixed computational cost for the comparison and detection steps, independent of the length of the reference sequence.

図３Ａ、図３Ｂ、図３Ｃ、および図３Ｄは、１つのＡＶＣＤ構成による、シーン変化を検出するために、参照画像シーケンスとクエリ画像とを比較する例をまとめて示す。参照画像は、再構成モデルをトレーニングするために集合的に使用される。 3A, 3B, 3C, and 3D collectively show an example of comparing a reference image sequence and a query image to detect a scene change according to one AVCD configuration. The reference image is used collectively to train the reconstruction model.

図３Ａは、基準画像シーケンスの単一画像フレーム３１０の一例を示し、単一画像フレーム３１０はパッチ３１２を含む。１つのＡＶＣＤ構成では、再構成モデルは、環状領域３１６内の画素から抽出された特徴に基づいてパッチ３１２内の画素値を予測するステップを含む。環状領域３１６は、パッチ３１２の境界と、より大きな同心パッチ３１５の境界と、によって境界付けられる。パッチ予測方法の例は、辞書学習モデル、人工ニューラルネットワーク（ディープニューラルネットワークともいう）等を含む。１つのＡＶＣＤ構成では、再構成モデルは複数の参照画像から抽出された複数のパッチを使用してトレーニングされる。 FIG. 3A shows an example of a single image frame 310 of a reference image sequence, the single image frame 310 including a patch 312. In one AVCD configuration, the reconstruction model includes predicting pixel values in patch 312 based on features extracted from pixels in annular region 316. The annular region 316 is bounded by the boundary of the patch 312 and the boundary of the larger concentric patch 315. Examples of the patch prediction method include a dictionary learning model, an artificial neural network (also called a deep neural network), and the like. In one AVCD configuration, the reconstruction model is trained using multiple patches extracted from multiple reference images.

図３Ｂはクエリ画像３２０の一例を示し、図３Ａの参照画像３１０と同じシーンの部分を示す。参照画像３１０の撮像とクエリ画像３２０の撮像との間の時間の間に、車両３２７の登場に起因してシーンが変化した。 FIG. 3B shows an example of the query image 320 and shows the same scene portion as the reference image 310 of FIG. 3A. During the time between capturing the reference image 310 and capturing the query image 320, the scene changed due to the advent of the vehicle 327.

図３Ｃは、図３Ａの参照画像３１０を含む参照画像についてトレーニングされた再構成モデルを使用してクエリ画像３２０を処理することによって計算された、再構成されたクエリ画像３３０の一例を示す。再構成されたクエリ画像３３０内の画素値３３３は、画素３３３を含むパッチ３３２を予測することによって部分的に決定される。パッチ３３２は、クエリ画像３２０内の同じ位置にある対応するパッチ３２２と、クエリ画像３２０内のより大きな同心パッチ３２５と、の間の環状領域内の画素から抽出された特徴に基づいて予測される。再構成プロセスは、再構成されたクエリ画像３３０全体を決定するために、すべての画素位置に適用される。車両３２７は、車両３２７が現れない参照画像３１０についてトレーニングされた再構成モデルによっては予測することができないので、クエリ画像３２０内の車両３２７は、再構成されたクエリ画像３３０内に現れない。 FIG. 3C shows an example of a reconstructed query image 330 calculated by processing the query image 320 using a reconstructed reconstruction model for reference images including the reference image 310 of FIG. 3A. The pixel value 333 in the reconstructed query image 330 is partially determined by predicting the patch 332 that contains the pixel 333. The patch 332 is predicted based on the features extracted from the pixels in the annular region between the corresponding co-located patch 322 in the query image 320 and the larger concentric patch 325 in the query image 320. .. The reconstruction process is applied to all pixel locations to determine the entire reconstructed query image 330. Vehicles 327 in query image 320 do not appear in reconstructed query image 330 because vehicles 327 cannot be predicted by the reconstructed model trained on reference image 310 in which vehicle 327 does not appear.

図３［Ｄ］は、図３Ｃの再構成されたクエリ画像３３０と、図３Ｂのクエリ画像３２０と、を比較することによって決定される変化マスク３４０の一例を示す。１つのＡＶＣＤ構成では、変化マスク３４０内の画素位置３４３における値が、再構成されたクエリ画像３３０内の対応する画素位置３３３における値と、クエリ画像３２０内の対応する画素位置３２３と、の間の絶対差を計算し、閾値を適用することによって決定されるバイナリ値である。 FIG. 3D shows an example of a change mask 340 determined by comparing the reconstructed query image 330 of FIG. 3C with the query image 320 of FIG. 3B. In one AVCD configuration, the value at pixel position 343 in change mask 340 is between the value at corresponding pixel position 333 in reconstructed query image 330 and the corresponding pixel position 323 in query image 320. Is a binary value determined by calculating the absolute difference of and applying a threshold.

図３Ｄの例では、再構成されたクエリ画像３３０の３３３の画素はクエリ画像３２０の３２３の画素と一致しないので、変化マスクの位置３４３の画素値にはバイナリ値１が割り当てられる。逆に、再構成されたクエリ画像３３０の３３４の画素はクエリ画像３２０の３２４の画素と一致するので、変化マスク３４０の３４４の画素にはバイナリ値０が割り当てられる。 In the example of FIG. 3D, 333 pixels of the reconstructed query image 330 do not match 323 pixels of the query image 320, so the pixel value at position 343 of the change mask is assigned a binary value of 1. Conversely, the 334 pixels of the reconstructed query image 330 match the 324 pixels of the query image 320, so the 344 pixels of the change mask 340 are assigned the binary value 0.

構造的環境
図２Ａおよび図２Ｂは、説明した各種ＡＶＣＤ構成が実施可能な汎用コンピュータシステム２５０を示している。 Structural Environment FIGS. 2A and 2B illustrate a general purpose computer system 250 on which the various AVCD configurations described may be implemented.

図２Ａに見られるように、コンピュータシステム２５０は、コンピュータモジュール２０１と、キーボード２０２、マウスポインタデバイス２０３、スキャナ２２６、カメラ２６１および２６２のような１つまたは複数のカメラ、およびマイクロフォン２８０のような入力デバイスと、プリンタ２１５、ディスプレイデバイス２１４、およびスピーカ２１７を含む出力デバイスと、を含む。外部変調器−復調器（モデム）トランシーバデバイス２１６は、接続２２１を介して通信ネットワーク２２０を介して２６０のような遠隔カメラと通信するためにコンピュータモジュール２０１によって使用されてもよい。通信ネットワーク２２０は、インターネット、セルラ電気通信ネットワーク、またはプライベートＷＡＮなどのワイドエリアネットワーク（ＷＡＮ）とすることができる。接続２２１が電話回線である場合、モデム２１６は、従来の「ダイヤルアップ」モデムとすることができる。代替として、接続２２１が大容量（例えば、ケーブル）接続である場合、モデム２１６は、ブロードバンドモデムであり得る。無線モデムはまた、通信ネットワーク２２０への無線接続のために使用され得る。 As seen in FIG. 2A, computer system 250 includes computer module 201, keyboard 202, mouse pointer device 203, one or more cameras such as scanner 226, cameras 261 and 262, and input such as microphone 280. Devices and output devices including printers 215, display devices 214, and speakers 217. An external modulator-demodulator (modem) transceiver device 216 may be used by computer module 201 to communicate with a remote camera, such as 260, via communication network 220 via connection 221. Communication network 220 may be the Internet, a cellular telecommunications network, or a wide area network (WAN) such as a private WAN. If connection 221 is a telephone line, modem 216 can be a conventional "dial-up" modem. Alternatively, if connection 221 is a high capacity (eg, cable) connection, modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to communication network 220.

コンピュータモジュール２０１は、典型的には少なくとも１つのプロセッサユニット２０５およびメモリユニット２０６を含む。例えば、メモリユニット２０６は、半導体RAM(random access memory)及び半導体ROM(read only memory)を有することができる。コンピュータモジュール２０１は、ビデオディスプレイ２１４、スピーカ２１７、およびマイクロフォン２８０に結合するオーディオビデオインターフェース２０７、キーボード２０２、マウス２０３、スキャナ２２６、カメラ２６１、およびオプションとしてジョイスティックまたは他のヒューマンインターフェースデバイス（図示せず）に結合するＩ／Ｏインターフェース２１３、ならびに外部モデム２１６およびプリンタ２１５のためのインターフェース２０８を含む、いくつかの入出力（Ｉ／Ｏ）インターフェースも含む。いくつかの実装形態では、モデム２１６がコンピュータモジュール２０１内、例えばインターフェース２０８内に組み込まれてもよい。コンピュータモジュール２０１はまた、ローカルエリアネットワーク（ＬＡＮ）として知られるローカルエリア通信ネットワーク２２２への接続２２３を介してコンピュータシステム２５０の結合を可能にするローカルネットワークインターフェース２１１を有する。図２Ａに示すように、ローカル通信ネットワーク２２２は、通常、いわゆる「ファイアウォール」デバイスまたは同様の機能の装置を含む接続２２４を介してワイドネットワーク２２０に結合することもできる。ローカルネットワークインターフェース２１１は、イーサネット（登録商標）回路カード、Ｂｌｕｅｔｏｏｔｈ（登録商標）ワイヤレス構成、またはＩＥＥＥ８０２．１１ワイヤレス構成を備えることができるが、インターフェース２１１については多数の他のタイプのインターフェースを実施することができる。 Computer module 201 typically includes at least one processor unit 205 and memory unit 206. For example, the memory unit 206 may include a semiconductor RAM (random access memory) and a semiconductor ROM (read only memory). Computer module 201 includes audio-video interface 207 coupled to video display 214, speaker 217, and microphone 280, keyboard 202, mouse 203, scanner 226, camera 261, and optionally a joystick or other human interface device (not shown). It also includes a number of input/output (I/O) interfaces, including an I/O interface 213 coupled to the I/O interface 213, and an interface 208 for external modem 216 and printer 215. In some implementations, modem 216 may be incorporated within computer module 201, eg, interface 208. Computer module 201 also has a local network interface 211 that enables coupling of computer system 250 via a connection 223 to a local area communication network 222 known as a local area network (LAN). As shown in FIG. 2A, local communication network 222 may also be coupled to wide network 220 via connection 224, which typically includes so-called “firewall” devices or devices of similar function. The local network interface 211 may comprise an Ethernet circuit card, a Bluetooth wireless configuration, or an IEEE 802.11 wireless configuration, but the interface 211 may implement many other types of interfaces. You can

Ｉ／Ｏインターフェース２０８および２１３は、シリアルおよびパラレル接続性のいずれかまたは両方を提供することができ、前者は、典型的にはユニバーサルシリアルバス（ＵＳＢ）規格に従って実装され、対応するＵＳＢコネクタ（図示せず）を有する。記憶装置２０９が提供され、典型的にはハードディスクドライブ（ＨＤＤ）２１０を含む。フロッピー（登録商標）ディスクドライブおよび磁気テープドライブ（図示せず）などの他の記憶装置も使用することができる。光ディスクドライブ２１２は、典型的にはデータの不揮発性ソースとして働くように設けられる。例えば、光ディスク（例えば、ＣＤ−ＲＯＭ、ＤＶＤ、Blu ray DiscTM)、ＵＳＢ−ＲＡＭ、ポータブル、外部ハードドライブ、およびフロッピー（登録商標）ディスクなどのポータブルメモリデバイスを、システム２５０へのデータの適切なソースとして使用することができる。 The I/O interfaces 208 and 213 can provide either or both serial and parallel connectivity, the former typically implemented according to the Universal Serial Bus (USB) standard and corresponding USB connectors (see FIG. (Not shown). A storage device 209 is provided and typically includes a hard disk drive (HDD) 210. Other storage devices such as floppy disk drives and magnetic tape drives (not shown) can also be used. Optical disc drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices such as, for example, optical discs (eg, CD-ROM, DVD, Blu ray DiscTM), USB-RAM, portable, external hard drives, and floppy discs are suitable sources of data for system 250. Can be used as

コンピュータモジュール２０１のコンポーネント２０５〜２１３は、典型的には相互接続されたバス２０４を介して、当業者に知られているコンピュータシステム２５０の従来の動作モードをもたらすように通信する。例えば、プロセッサ２０５は、接続２１８を使用してシステムバス２０４に結合される。同様に、メモリ２０６および光ディスクドライブ２１２は、接続２１９によってシステムバス２０４に結合される。説明した構成を実施することができるコンピュータの例には、ＩＢＭ−ＰＣおよび互換機、Sun Sparcstations、Apple MacTM、または同様のコンピュータシステムが含まれる。 The components 205-213 of the computer module 201 typically communicate via an interconnected bus 204 to effect conventional operating modes of the computer system 250 known to those skilled in the art. For example, processor 205 is coupled to system bus 204 using connection 218. Similarly, memory 206 and optical disc drive 212 are coupled to system bus 204 by connection 219. Examples of computers that can implement the described arrangements include IBM-PC and compatibles, Sun Sparcstations, Apple MacTM, or similar computer systems.

ＡＶＣＤ方法は、コンピュータシステム２５０を使用して実施することができ、説明される図４、５、７、および９のプロセスは、コンピュータシステム２５０内で実行可能な１つまたは複数のＡＶＣＤソフトウェアアプリケーションプログラム２３３として実施することができる。具体的には、ＡＶＣＤ方法のステップがコンピュータシステム２５０内で実行されるソフトウェア２３３内の命令２３１（図２Ｂ）によって実行される。ソフトウェア命令２３１は、それぞれが１つまたは複数の特定のタスクを実行するための１つまたは複数のコードモジュールとして形成され得る。ソフトウェアはまた、２つの別個の部分に分割されてもよく、第１の部分および対応するコードモジュールがＡＶＣＤ方法を実行し、第２の部分および対応するコードモジュールが第１の部分とユーザとの間のユーザインターフェースを管理する。 The AVCD method can be implemented using a computer system 250, and the processes of FIGS. 4, 5, 7, and 9 described include one or more AVCD software application programs executable within the computer system 250. 233 can be implemented. Specifically, the steps of the AVCD method are performed by instructions 231 (FIG. 2B) in software 233 executed in computer system 250. Software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, the first part and the corresponding code module performing the AVCD method, and the second part and the corresponding code module of the first part and the user. Manage the user interface between.

ＡＶＣＤソフトウェアは例えば、以下に記載される記憶装置を含むコンピュータ可読媒体に記憶されてもよい。ソフトウェアは、コンピュータ可読媒体からコンピュータシステム２５０にロードされ、次いで、コンピュータシステム２５０によって実行される。コンピュータ可読媒体に記録されたそのようなソフトウェアまたはコンピュータプログラムを有するコンピュータ可読媒体は、コンピュータプログラム製品である。コンピュータシステム２５０におけるコンピュータプログラム製品の使用は、好ましくは、ＡＶＣＤ方法を実施するための有利な装置をもたらす。 The AVCD software may be stored on a computer readable medium, including the storage devices described below, for example. The software is loaded into computer system 250 from a computer-readable medium and then executed by computer system 250. A computer-readable medium having such software or computer program recorded on the computer-readable medium is a computer program product. The use of computer program products in computer system 250 preferably provides an advantageous apparatus for implementing the AVCD method.

ソフトウェア２３３は、典型的にはＨＤＤ２１０またはメモリ２０６に格納される。ソフトウェアは、コンピュータ可読媒体からコンピュータシステム２５０にロードされ、コンピュータシステム２５０によって実行される。したがって、例えば、ソフトウェア２３３は、光ディスクドライブ２１２によって読み取られる光学的に読み取り可能なディスク記憶媒体（例えば、ＣＤ−ＲＯＭ）２２５に記憶されてもよい。そのようなソフトウェアまたはコンピュータプログラムが記録されたコンピュータ可読媒体は、コンピュータプログラム製品である。コンピュータシステム２５０におけるコンピュータプログラム製品の使用は、好ましくは、ＡＶＣＤ構成を実施するための装置をもたらす。 The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into computer system 250 from a computer readable medium and executed by computer system 250. Thus, for example, the software 233 may be stored on an optically readable disc storage medium (eg, CD-ROM) 225 that is read by the optical disc drive 212. A computer-readable medium having such software or computer program recorded on it is a computer program product. The use of computer program products in computer system 250 preferably results in an apparatus for implementing AVCD configuration.

場合によっては、ＡＶＣＤアプリケーションプログラム２３３が１つまたは複数のＣＤ−ＲＯＭ２２５上で符号化されてユーザに供給され、対応するドライブ２１２を介して読み取られてもよく、あるいはユーザによってネットワーク２２０または２２２から読み取られてもよい。さらに、ソフトウェアは、他のコンピュータ可読媒体からコンピュータシステム２５０にロードすることもできる。コンピュータ可読記憶媒体は、実行および／または処理のために記録された命令および／またはデータをコンピュータシステム２５０に提供する任意の一時的でない有形の記憶媒体を指す。そのような記憶媒体の例には、フロッピー（登録商標）ディスク、磁気テープ、ＣＤ−ＲＯＭ、ＤＶＤ、Ｂｌｕ−ray（登録商標）TM Disc、ハードディスクドライブ、ＲＯＭまたは集積回路、ＵＳＢメモリ、光磁気ディスク、またはＰＣＭＣＩＡカードなどのコンピュータ可読カードなどが含まれ、そのようなデバイスがコンピュータモジュール２０１の内部または外部であるかどうかにかかわらない。コンピュータモジュール２０１へのソフトウェア、アプリケーションプログラム、命令および／またはデータの提供にも関与し得る一時的または非有形のコンピュータ可読伝送媒体の例は、無線または赤外線伝送チャネル、ならびに別のコンピュータまたはネットワーク化されたデバイスへのネットワーク接続、ならびに電子メール伝送およびウェブサイトなどに記録された情報を含むインターネットまたはイントラネットを含む。 In some cases, the AVCD application program 233 may be encoded on one or more CD-ROMs 225 and provided to the user and read via the corresponding drive 212, or read by the user from the network 220 or 222. May be Further, the software may be loaded into computer system 250 from other computer readable media. Computer-readable storage media refers to any non-transitory tangible storage media that provides recorded instructions and/or data to computer system 250 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tapes, CD-ROMs, DVDs, Blu-ray TM disks, hard disk drives, ROMs or integrated circuits, USB memories, magneto-optical disks. , Or a computer readable card, such as a PCMCIA card, regardless of whether such a device is internal or external to computer module 201. Examples of temporary or non-tangible computer readable transmission media that may also be involved in providing software, application programs, instructions and/or data to computer module 201 include wireless or infrared transmission channels, as well as another computer or networked. Network connection to the device, as well as the Internet or Intranet, including information recorded on email transmissions and websites.

アプリケーションプログラム２３３の第２の部分および上述の対応するコードモジュールは、ディスプレイ２１４上にレンダリングまたは表現される１つまたは複数のグラフィカルユーザインターフェース（ＧＵＩ）を実装するために実行することができる。典型的にはキーボード２０２およびマウス２０３の操作を通じて、アプリケーションおよびコンピュータシステム２５０のユーザは、機能的に適応可能な方法でインターフェースを操作して、ＧＵＩに関連するアプリケーションに制御コマンドおよび／または入力を提供することができる。スピーカ２１７を介して出力されるスピーチプロンプトおよびマイクロフォン２８０を介して入力されるユーザ音声コマンドを利用するオーディオインターフェースなど、他の形態の機能的に適応可能なユーザインターフェースも実装され得る。 The second portion of application program 233 and the corresponding code module described above may be executed to implement one or more graphical user interfaces (GUIs) rendered or rendered on display 214. Through operation of the keyboard 202 and mouse 203, typically the user of the application and computer system 250 operates the interface in a functionally adaptable manner to provide control commands and/or inputs to the application associated with the GUI. can do. Other forms of functionally adaptable user interfaces may also be implemented, such as audio interfaces that utilize speech prompts output via speaker 217 and user voice commands input via microphone 280.

図２Ｂは、プロセッサ２０５および「メモリ」２３４の詳細な概略ブロック図である。メモリ２３４は、図２Ａのコンピュータモジュール２０１がアクセス可能な全てのメモリモジュール（ＨＤＤ２０９、半導体メモリ２０６を含む）の論理的な集合体を表す。 FIG. 2B is a detailed schematic block diagram of processor 205 and “memory” 234. The memory 234 represents a logical collection of all memory modules (including the HDD 209 and the semiconductor memory 206) accessible by the computer module 201 of FIG. 2A.

最初にコンピュータモジュール２０１の電源を入れると、パワーオン自己テスト（ＰＯＳＴ）プログラム２５０が実行される。ＰＯＳＴプログラム２５０は、典型的には図２Ａの半導体メモリ２０６のＲＯＭ２４９に格納される。ソフトウェアを格納するＲＯＭ２４９のようなハードウェアデバイスは、ファームウェアと呼ばれることがある。ＰＯＳＴプログラム２５０は、コンピュータモジュール２０１内のハードウェアを検査して、適切な機能を保証し、典型的には、プロセッサ２０５、メモリ２３４（２０９、２０６）、および、また典型的にはＲＯＭ２４９に格納されている基本入出力システムソフトウェア（ＢＩＯＳ）モジュール２５１を、正しい動作について検査する。ＰＯＳＴプログラム２５０が正常に実行されると、ＢＩＯＳ２５１は図２Ａのハードディスクドライブ２１０を起動する。ハードディスクドライブ２１０の起動により、ハードディスクドライブ２１０に常駐するブートストラップローダプログラム２５２が、プロセッサ２０５を介して実行される。これにより、オペレーティングシステム２５３がＲＡＭメモリ２０６にロードされ、その上でオペレーティングシステム２５３が動作を開始する。オペレーティングシステム２５３はプロセッサ管理、メモリ管理、デバイス管理、ストレージ管理、ソフトウェアアプリケーションインターフェース、および汎用ユーザインターフェースを含む、様々な高レベル機能を実行するために、プロセッサ２０５によって実行可能なシステムレベルアプリケーションである。 When the computer module 201 is first turned on, the power-on self-test (POST) program 250 is executed. The POST program 250 is typically stored in the ROM 249 of the semiconductor memory 206 of FIG. 2A. A hardware device, such as ROM 249, that stores software is sometimes referred to as firmware. The POST program 250 tests the hardware within the computer module 201 to ensure proper functionality and is typically stored in processor 205, memory 234 (209, 206), and also typically ROM 249. The basic input/output system software (BIOS) module 251 being tested is checked for correct operation. When the POST program 250 is normally executed, the BIOS 251 activates the hard disk drive 210 of FIG. 2A. When the hard disk drive 210 is activated, the bootstrap loader program 252 resident in the hard disk drive 210 is executed via the processor 205. As a result, the operating system 253 is loaded into the RAM memory 206, and the operating system 253 starts operating on it. Operating system 253 is a system level application executable by processor 205 to perform various high level functions, including processor management, memory management, device management, storage management, software application interfaces, and general user interfaces.

オペレーティングシステム２５３は、メモリ２３４（２０９、２０６）を管理して、コンピュータモジュール２０１上で実行される各プロセスまたはアプリケーションが別のプロセスに割り当てられたメモリと衝突することなく実行するのに十分なメモリを有することを保証する。さらに、図２Ａのシステム２５０で利用可能な異なるタイプのメモリは、各プロセスが効果的に実行できるように、適切に使用されなければならない。したがって、集約メモリ２３４は、メモリの特定のセグメントがどのように割り振られるかを示すことを意図するものではなく（特に断らない限り）、コンピュータシステム２５０によってアクセス可能なメモリの一般的なビューおよびそのようなものがどのように使用されるかを提供することを意図するものである。 The operating system 253 manages the memory 234 (209, 206) to have sufficient memory for each process or application executing on the computer module 201 to execute without conflicting with memory allocated to another process. Guaranteed to have. Moreover, the different types of memory available in the system 250 of FIG. 2A must be used appropriately so that each process can execute effectively. Accordingly, aggregate memory 234 is not intended to indicate how a particular segment of memory is allocated (unless otherwise noted), and is a general view of memory accessible by computer system 250 and its It is intended to provide how such things are used.

図２Ｂに示すように、プロセッサ２０５は、制御ユニット２３９、算術論理ユニット（ＡＬＵ）２４０、およびキャッシュメモリと呼ばれることもあるローカルまたは内部メモリ２４８を含むいくつかの機能モジュールを含む。キャッシュメモリ２４８は、典型的にはレジスタセクション内に多数の記憶レジスタ２４４〜２４６を含む。１つまたは複数の内部バス２４１は、これらの機能モジュールを機能的に相互接続する。プロセッサ２０５は、典型的には、接続２１８を使用してシステムバス２０４を介して外部装置と通信するための１つまたは複数のインターフェース２４２も有する。メモリ２３４は、接続２１９を使用してバス２０４に結合される。 As shown in FIG. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and local or internal memory 248, sometimes referred to as cache memory. Cache memory 248 typically includes a number of storage registers 244-246 in a register section. One or more internal buses 241 functionally interconnect these functional modules. Processor 205 also typically has one or more interfaces 242 for communicating with external devices via system bus 204 using connection 218. Memory 234 is coupled to bus 204 using connection 219.

ＡＶＣＤアプリケーションプログラム２３３は、条件分岐およびループ命令を含むことができる一連の命令２３１を含む。プログラム２３３はまた、プログラム２３３の実行に使用されるデータ２３２を含むことができる。命令２３１およびデータ２３２は、それぞれメモリ位置２２８、２２９、２３０および２３５、２３６、２３７に格納される。命令２３１とメモリ位置２２８〜２３０の相対的なサイズに応じて、メモリ位置２３０に示される命令によって示されるように、特定の命令が単一のメモリ位置に記憶されてもよい。あるいは、命令が、メモリ位置２２８および２２９に示される命令セグメントによって示されるように、各々が別個のメモリ位置に格納されるいくつかの部分にセグメント化されてもよい。 AVCD application program 233 includes a series of instructions 231 that can include conditional branch and loop instructions. Program 233 may also include data 232 used to execute program 233. Instruction 231 and data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending on the relative size of instruction 231 and memory locations 228-230, the particular instruction may be stored in a single memory location, as indicated by the instruction shown in memory location 230. Alternatively, the instructions may be segmented into a number of portions, each stored in a separate memory location, as indicated by the instruction segments shown in memory locations 228 and 229.

一般に、プロセッサ２０５には、そこで実行される命令のセットが与えられる。プロセッサ２０５は次の入力を待ち、プロセッサ２０５は、命令の別のセットを実行することによって、この入力に反応する。各入力は、入力デバイス２０２、２０３のうちの１つまたは複数によって生成されたデータ、ネットワーク２２０、２０２のうちの１つを介して外部ソースから受信されたデータ、記憶デバイス２０６、２０９のうちの１つから取り出されたデータ、または対応するリーダ２１２に挿入された記憶媒体２２５から取り出されたデータを含む、いくつかのソースのうちの１つまたは複数から提供することができ、すべて図２Ａに示されている。命令のセットの実行は、場合によってはデータの出力をもたらし得る。実行はまた、データまたは変数をメモリ２３４に格納することを含むことができる。 Generally, processor 205 is provided with a set of instructions to execute there. Processor 205 waits for the next input and processor 205 responds to this input by executing another set of instructions. Each input is data generated by one or more of the input devices 202, 203, data received from an external source via one of the networks 220, 202, of the storage devices 206, 209. It may be provided from one or more of several sources, including data retrieved from one or retrieved from a storage medium 225 inserted into a corresponding reader 212, all in FIG. 2A. It is shown. Execution of the set of instructions may result in the output of data in some cases. Execution may also include storing data or variables in memory 234.

開示されたＡＶＣＤ構成は、メモリ２３４内の対応するメモリ位置２５５、２５６、２５７に格納されている入力変数群２５４を使用する。ＡＶＣＤ構成は、出力変数群２６１を生成し、これは、メモリ２３４内の対応するメモリ位置２６２、２６３、２６４に格納される。中間変数群２５８は、メモリ位置２５９、２６０、２６６、および２６７に格納され得る。 The disclosed AVCD configuration uses input variables 254 stored in corresponding memory locations 255, 256, 257 in memory 234. The AVCD configuration produces a set of output variables 261, which are stored in memory 234 at corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266, and 267.

図２Ｂのプロセッサ２０５を参照すると、レジスタ２４４、２４５、２４６、算術論理ユニット（ＡＬＵ）２４０、および制御ユニット２３９は、プログラム２３３を構成する命令セット内のすべての命令に対して「フェッチ、デコード、および実行」サイクルを実行するのに必要なマイクロオペレーションのシーケンスを実行するために協働する。各フェッチ、デコード、および実行サイクルは、
・メモリ位置２２８、２２９、２３０から命令２３１をフェッチするかまたは読み出すフェッチ動作
・制御ユニット２３９が、どの命令がフェッチされたかを判定するデコード動作
・制御ユニット２３９及び／又はＡＬＵ２４０が命令を実行する実行動作
を有する。 Referring to the processor 205 of FIG. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 “fetch, decode, And execute" cycles to perform the sequence of micro-operations required to perform the cycle. Each fetch, decode, and run cycle is
A fetch operation for fetching or reading the instruction 231 from the memory locations 228, 229, 230 a control unit 239 decode operation for determining which instruction has been fetched a control unit 239 and/or an ALU 240 execute the instruction Have an action.

その後、次の命令のためのさらなるフェッチ、デコード、および実行サイクルが実行され得る。同様に、制御ユニット２３９がメモリ位置２３２に値を保存または書き込む保存サイクルが実行される。 Thereafter, additional fetch, decode, and execute cycles for the next instruction may be performed. Similarly, a save cycle is performed in which control unit 239 saves or writes a value to memory location 232.

図４、図５、図７、および図８のプロセスにおける各ステップまたはサブプロセスは、プログラム２３３の１つまたは複数のセグメントに関連付けられ、レジスタセクション２４４、２４５、２４７、ＡＬＵ２４０、およびプロセッサ２０５内の制御ユニット２３９が協働して、プログラム２３３の注目セグメントのための命令セット内のすべての命令についてフェッチ、デコード、および実行サイクルを実行することによって実行される。 Each step or sub-process in the processes of FIGS. 4, 5, 7 and 8 is associated with one or more segments of program 233 and within register sections 244, 245, 247, ALU 240, and processor 205. Control unit 239 cooperates to execute by performing fetch, decode, and execute cycles for all instructions in the instruction set for the segment of interest of program 233.

ＡＶＣＤ方法は代替的に、ＡＶＣＤ機能またはサブ機能を実行する１つまたは複数の集積回路などの専用ハードウェアで実装され得る。そのような専用ハードウェアは、グラフィックプロセッサ、デジタル信号プロセッサ、または１つ以上のマイクロプロセッサおよび関連するメモリを含み得、ビデオカメラなどのプラットフォーム上に存在し得る。 The AVCD method may alternatively be implemented in dedicated hardware, such as one or more integrated circuits that perform AVCD functions or subfunctions. Such dedicated hardware may include a graphics processor, digital signal processor, or one or more microprocessors and associated memory, and may reside on a platform such as a video camera.

ＡＶＣＤ方法
図４は、１つのＡＶＣＤ構成による、参照画像シーケンスとクエリ画像とを比較することによってシーンの変化を検出する方法４００を示す。方法４００はハードディスクドライブ２１０に備えられ、プロセッサ２０５によってその実行が制御されるソフトウェアアプリケーションプログラム２３３の１つまたは複数のソフトウェアコードモジュールとして実施することができる。以下の説明は、方法４００の主要なステップについての詳細、例、および代替的な実装を提供する。サブプロセス４３０および４５０のさらなる詳細、例、および代替の実装は、それぞれ図５および図７を参照して後述される。 AVCD Method FIG. 4 shows a method 400 for detecting scene changes by comparing a reference image sequence and a query image according to one AVCD configuration. The method 400 may be implemented as one or more software code modules of a software application program 233, whose execution is controlled by the processor 205, provided in the hard disk drive 210. The following description provides details, examples, and alternative implementations for the major steps of method 400. Further details, examples, and alternative implementations of sub-processes 430 and 450 are described below with reference to FIGS. 5 and 7, respectively.

方法４００は第１の読み出しステップ４２０で開始し、ここで、１組の参照画像が入力として受信される。一例では、参照画像が、所定のパスに沿ってナビゲートする無人機に搭載された移動カメラによって撮像されたシーンを示す。この例では、カメラが動いている場合、参照画像は互いに異なる視点からシーンを取り込む。別の例では、参照画像が医療用撮像装置によって撮像され、患者の健康な組織を示す。 Method 400 begins with a first readout step 420, where a set of reference images is received as input. In one example, a reference image shows a scene captured by a mobile camera mounted on an unmanned aerial vehicle navigating along a predetermined path. In this example, when the camera is moving, the reference images capture the scene from different viewpoints. In another example, a reference image is imaged by the medical imaging device to show the healthy tissue of the patient.

次いで、制御はステップ４２０から、参照画像に基づいて再構成モデルをトレーニングするトレーニングサブプロセス４３０へと移行する。再構成モデルは次に、再構成されたクエリ画像が参照画像内に存在したクエリ画像内のシーン構造の態様を含むように、クエリ画像の再構成されたクエリ画像を決定するために使用することができる。参照画像内に存在しなかったシーンの態様は、再構成されたクエリ画像内に表示されない。したがって、トレーニングプロセスの機能は、部分的には参照画像に表されるシーン構造を学習することである。サブプロセス４３０のさらなる詳細、例、および代替の実装は、図５を参照して後に説明される。 Control then transfers from step 420 to a training sub-process 430 which trains the reconstruction model based on the reference image. The reconstruction model is then used to determine the reconstructed query image of the query image such that the reconstructed query image includes aspects of scene structure in the query image that were present in the reference image. You can Aspects of the scene that were not present in the reference image are not displayed in the reconstructed query image. Therefore, the function of the training process is to learn in part the scene structure represented in the reference image. Further details, examples, and alternative implementations of sub-process 430 are described below with reference to FIG.

次いで、制御はサブプロセス４３０から、参照画像と同じシーンに対応するクエリ画像を受信する第２の受信ステップ４４０へ移行する。クエリ画像は、参照画像とは異なる時間に撮像される。１つのＡＶＣＤ構成では、撮像プロセスのために、クエリ画像は幾分異なる視点から参照画像と同じシーンを撮像する。一例では、クエリ画像は、参照画像シーケンスを撮像するために以前に使用されたパスと同様の所定のパスに沿ってナビゲートする無人機に搭載された移動カメラによって撮像されたシーンを示す。別の例では、クエリ画像は、同じ医療用撮像装置を使用して参照画像において以前に撮像された患者の組織を示す。 Control then transfers from sub-process 430 to a second receiving step 440, which receives a query image corresponding to the same scene as the reference image. The query image is captured at a different time than the reference image. In one AVCD configuration, the query image captures the same scene as the reference image from a somewhat different perspective due to the imaging process. In one example, the query image shows a scene imaged by a mobile camera onboard the drone navigating along a predetermined path similar to the path previously used to image the reference image sequence. In another example, the query image shows patient tissue previously imaged in the reference image using the same medical imaging device.

１つの構成では、ステップ４２０および４４０は、サブプロセス４３０の前に実施することができる。したがって、この構成では、ステップ４２０および４４０はサブプロセス４３０の前に実行される。サブプロセス４３０が完了すると、この構成における方法４００はステップ４５０に進む。 In one configuration, steps 420 and 440 may be performed before sub-process 430. Therefore, in this configuration, steps 420 and 440 are performed before subprocess 430. Upon completion of sub-process 430, method 400 in this configuration proceeds to step 450.

次いで、制御はステップ４４０から、ステップ４４０で受信されたクエリ画像およびサブプロセス４３０で参照画像についてトレーニングされた再構築モデルに基づいて再構築されたクエリ画像を決定する再構成サブプロセス４５０に移行する。再構成されたクエリ画像は、参照画像内に存在したクエリ画像内のシーン構造の態様を含む。参照画像の撮像とクエリ画像との間で変化したシーンの態様は再構成されない。ステップ４５０のさらなる詳細、例、および代替的な実装は、図７を参照して後述される。 Control then transfers from step 440 to a reconstruction sub-process 450 that determines a reconstructed query image based on the query image received in step 440 and the reconstruction model trained on the reference image in sub-process 430. .. The reconstructed query image includes aspects of the scene structure in the query image that were present in the reference image. The aspect of the scene that changed between the capturing of the reference image and the query image is not reconstructed. Further details, examples, and alternative implementations of step 450 are described below with reference to FIG.

次いで、制御はサブプロセス４５０から、ステップ４４０で受信されたクエリ画像およびサブプロセス４５０で決定された再構築されたクエリ画像に基づいてシーン変化を判定する検出ステップ４６０に移行する。１つのＡＶＣＤ構成では、図３［Ｂ］、３［Ｃ］および３［Ｄ］に示されるように、検出ステップ４６０は、サブプロセス４５０で決定された対応する再構成されたクエリ画像３３０とクエリ画像３２０とを比較することによって、変化マスク３４０を決定する。変化マスク３４０内の画素位置３４３における値は、再構成されたクエリ画像３３０内の対応する画素位置３３３における値と、クエリ画像３２０内の対応する画素位置３２３と、の間の絶対差を計算し、閾値を適用することによって決定されるバイナリ値である。しきい値を適用する例は、差が３０より大きい場合にはバイナリ値１を割り当て、そうではない場合には値０を割り当てることである。シーン内の変化は、非ゼロ値を含む変化マスク内の画素位置で発生したと判定される。 Control then transfers from sub-process 450 to detection step 460 which determines a scene change based on the query image received in step 440 and the reconstructed query image determined in sub-process 450. In one AVCD configuration, the detection step 460 includes querying the corresponding reconstructed query image 330 determined in sub-process 450, as shown in FIGS. 3B, 3C and 3D. The change mask 340 is determined by comparing with the image 320. The value at pixel position 343 in change mask 340 calculates the absolute difference between the value at corresponding pixel position 333 in reconstructed query image 330 and the corresponding pixel position 323 in query image 320. , A binary value determined by applying a threshold. An example of applying a threshold is to assign a binary value of 1 if the difference is greater than 30, and a value of 0 otherwise. Changes in the scene are determined to have occurred at pixel locations in the change mask that contain non-zero values.

シーン変化を決定する前に変化マスクを後処理することを含むステップ４６０の実施形態は、ステップ４６０において同様に実行されてもよい。１つの代替ＡＶＣＤ構成では、形態学的フィルタリング動作のセットが、ノイズを除去するためにバイナリ変化マスクに適用される。形態学的フィルタリング動作の一例は、バイナリエロージョンである。形態学的フィルタ演算の別の例は、バイナリ拡張である。別の代替のＡＶＣＤ構成では、連結成分解析が別個の領域を識別するために変化マスクに適用される。さらに別の代替ＡＶＣＤ構成では、各検出されたオブジェクトの面積が決定され、固定面積閾値未満の面積を有する任意の領域がバイナリマスク内の対応する画素値をゼロに設定することによって破棄される。固定面積閾値の一例は、１０ｘ１０の正方形画素である。さらに別の代替ＡＶＣＤ構成では、検出された領域の追加の特徴が計算される。付加的な特徴の一例は重心である。付加的な特徴の別の例は、バウンディングボックスである。付加的な特徴のさらに別の例は、面積の二次モーメントである。ルールは、対応する領域が廃棄されるべきかどうかを決定するために追加の特徴に適用される。ルールの一例は、重心が画像内の関心領域（ＡＯＩ）の外側にある場合に領域を廃棄することである。ＡＯＩの一例は、ユーザによって画像内で指定され、ユーザが変化の検出を望む画像領域を示すバウンディングボックスである。 Embodiments of step 460 that include post-processing the change mask before determining the scene change may be performed at step 460 as well. In one alternative AVCD configuration, a set of morphological filtering operations are applied to the binary change mask to remove noise. One example of a morphological filtering operation is binary erosion. Another example of a morphological filter operation is a binary extension. In another alternative AVCD configuration, connected component analysis is applied to the varying mask to identify distinct regions. In yet another alternative AVCD configuration, the area of each detected object is determined and any region with an area below a fixed area threshold is discarded by setting the corresponding pixel value in the binary mask to zero. An example of a fixed area threshold is a 10x10 square pixel. In yet another alternative AVCD configuration, additional features of the detected area are calculated. One example of an additional feature is the center of gravity. Another example of an additional feature is a bounding box. Yet another example of an additional feature is the second moment of area. The rules are applied to the additional features to determine if the corresponding area should be discarded. One example of a rule is to discard an area if the centroid lies outside the area of interest (AOI) in the image. One example of an AOI is a bounding box that indicates a region of the image that the user has specified in the image and the user wants to detect changes.

方法４００は、検出ステップ４６０を完了した後に終了する。 Method 400 ends after completing detection step 460.

図５は、参照画像のシーケンスに基づいて再構成モデルをトレーニングするサブプロセス４３０を示す。サブプロセス４３０は、ここで、図６からの例示的な実施例を使用して説明される。サブプロセス４３０は、ハードディスクドライブ２１０に備えられ、プロセッサ２０５によってその実行において制御されるソフトウェアアプリケーションプログラム２３３の１つまたは複数のソフトウェアコードモジュールとして実装され得る。以下の説明は、サブプロセス４３０の主なステップの詳細、例、および代替的な実施を提供する。 FIG. 5 shows a sub-process 430 for training a reconstruction model based on a sequence of reference images. Sub-process 430 will now be described using the example embodiment from FIG. Sub-process 430 may be implemented as one or more software code modules of software application program 233, which is provided on hard disk drive 210 and is controlled in its execution by processor 205. The following description provides details, examples, and alternative implementations of the main steps of sub-process 430.

サブプロセス４３０は、ステップ４２０で受信された参照画像の各々のキーポイントを検出する検出ステップ５３０で開始する。画像内の「キーポイント」は、明確な位置を有し、輝度変化または幾何学的変形などの局所摂動にもかかわらず高い再現性で検出することができる局所画像構造である。キーポイントの一例は「コーナー」であり、これは、複数の方向における画像勾配によって特徴付けられる局所画像構造である。キーポイントの別の例は「ブロブ」であり、これは、中央領域と周囲領域との間の高いコントラストによって特徴付けられる局所画像構造である。キーポイントは、コーナー検出方法、ブロブ検出方法等を用いて検出することができる。コーナー検出方法の例は、Ｈａｒｒｉｓコーナー検出器、ＦＡＳＴコーナー検出器、Ｓｈｉ−Ｔｏｍａｓｉコーナー検出器等を含む。ブロブ検出方法の例は、difference of Gaussians（ＤＯＧ）ブロブ検出器、maximally stable extremal regions（ＭＳＥＲ）検出器等を含む。 Sub-process 430 begins with a detection step 530 which detects keypoints for each of the reference images received in step 420. A "keypoint" in an image is a local image structure that has a well-defined position and can be detected with high reproducibility despite local perturbations such as intensity changes or geometric deformations. An example of a key point is a "corner", which is a local image structure characterized by image gradients in multiple directions. Another example of a keypoint is a "blob," which is a local image structure characterized by high contrast between the central and surrounding areas. The key points can be detected using a corner detection method, a blob detection method, or the like. Examples of corner detection methods include Harris corner detectors, FAST corner detectors, Shi-Tomasi corner detectors, and the like. Examples of blob detection methods include difference of Gaussians (DOG) blob detectors, maximally stable extremal regions (MSER) detectors, and the like.

１つのＡＶＣＤ構成では、ステップ５３０で適用されるキーポイント検出器が、検出された各キーポイントに関連付けられた応答値を生成する。応答値は、キーポイント検出器に対するキーポイントの応答の強さを示す。キーポイント検出器は、画像全体に適用され、固定閾値未満の応答値を有するキーポイントが破棄される。固定閾値の一例は、ＤＯＧコーナー検出器の場合０．００１である。 In one AVCD configuration, the keypoint detector applied at step 530 produces a response value associated with each detected keypoint. The response value indicates the strength of the response of the keypoint to the keypoint detector. The keypoint detector is applied to the entire image and keypoints with a response value below a fixed threshold are discarded. An example of a fixed threshold is 0.001 for a DOG corner detector.

別のＡＶＣＤ構成では、画像が固定サイズの重なり合わないセルの格子に分割され、各格子セルにおいて最も高い応答値を有する検出されたキーポイントが保持され、他のすべてのキーポイントは廃棄される。固定サイズの一例は１６×１６画素である。 In another AVCD configuration, the image is divided into a grid of fixed size non-overlapping cells, the detected keypoint with the highest response value in each grid cell is retained, and all other keypoints are discarded. .. An example of a fixed size is 16x16 pixels.

次に、制御はステップ５３０から、参照画像の各々における１つまたは複数のパッチバウンディングボックスを判定する判定ステップ５３５に移行する。１つのＡＶＣＤ構成では、パッチバウンディングボックスは、ステップ５３０で検出された各キーポイントについて決定される。図６の例示的な実施例を参照すると、参照画像６００内のパッチバウンディングボックス６２０は、キーポイント６１０を中心とする固定サイズの正方形として決定される。したがって、上記の構成は、検出されたキーポイントに基づいてパッチバウンディングボックスを決定する。 Control then proceeds from step 530 to decision step 535 which determines one or more patch bounding boxes in each of the reference images. In one AVCD configuration, the patch bounding box is determined for each keypoint detected in step 530. With reference to the exemplary embodiment of FIG. 6, the patch bounding box 620 in the reference image 600 is determined as a fixed size square centered on the keypoint 610. Therefore, the above arrangement determines the patch bounding box based on the detected keypoints.

別のＡＶＣＤ構成では、パッチバウンディングボックスは、画素位置を中心とする固定サイズの矩形として、参照画像内の全ての画素位置で決定される。さらに別のＡＶＣＤ構成では、ステップ５３５で、参照画像を固定サイズの正方形パッチの規則的なグリッドに分割することによって、重複しないパッチバウンディングボックスのセットが決定される。固定サイズの一例は１６×１６画素である。 In another AVCD configuration, the patch bounding box is determined at every pixel position in the reference image as a fixed size rectangle centered at the pixel position. In yet another AVCD configuration, in step 535, a set of non-overlapping patch bounding boxes is determined by dividing the reference image into a regular grid of fixed size square patches. An example of a fixed size is 16x16 pixels.

次に、制御はステップ５３５から、再構成モデルを初期化する初期化ステップ５４０に移行する。１つのＡＶＣＤ構成では、再構成モデルは畳み込みディープニューラルネットワークであり、畳み込みディープニューラルネットワーク内の重みはランダムに初期化される。別のＡＶＣＤ構成では、ディープニューラルネットワーク内の重みは、異なるタスクで以前にトレーニングされたディープニューラルネットワークから重みを取り出すことによって部分的に初期化される。異なるタスクの一例は画像分類である。当業者は、ディープニューラルネットワークにおける重みを初期化するための他の方法がステップ５４０において同様に実行され得ることを認識する。 Control then transfers from step 535 to an initialization step 540 which initializes the reconstruction model. In one AVCD configuration, the reconstruction model is a convolutional deep neural network and the weights in the convolutional deep neural network are randomly initialized. In another AVCD configuration, the weights in the deep neural network are partially initialized by retrieving the weights from the deep neural network previously trained in different tasks. An example of a different task is image classification. Those skilled in the art will recognize that other methods for initializing weights in deep neural networks may be performed in step 540 as well.

次いで、制御はステップ５４０から、再構成モデルを用いて、ステップ５３５で決定された各パッチについて予測されるパッチを計算する予測サブプロセス５５０に移行する。その結果、ステップ５５０の最初の実行で使用される再構成モデルは、ステップ５４０で初期化されたモデルである。ステップ５５０の後続の実行で使用される再構成モデルは、ステップ５７０で更新されたモデルである（以下で説明する）。 Control then transfers from step 540 to a prediction sub-process 550 that calculates a predicted patch for each patch determined in step 535 using the reconstruction model. As a result, the reconstructed model used in the first execution of step 550 is the model initialized in step 540. The reconstruction model used in subsequent executions of step 550 is the model updated in step 570 (discussed below).

１つのＡＶＣＤ構成では、予測画像パッチが畳み込みディープニューラルネットワーク（すなわち、再構成モデル）を使用して参照画像を処理し、ステップ５３５で決定されたパッチバウンディングボックスに対応する畳み込みディープニューラルネットワークの出力層からアクティベーションを選択することによって決定される。サブプロセス５５０のさらなる詳細、例、および代替の実装は、図９の方法９００を参照して後に説明される。 In one AVCD configuration, the predictive image patch processes the reference image using a convolutional deep neural network (ie, a reconstruction model) and outputs the output layer of the convolutional deep neural network corresponding to the patch bounding box determined in step 535. Determined by selecting the activation from. Further details, examples, and alternative implementations of sub-process 550 are described below with reference to method 900 of FIG.

次いで、制御はサブプロセス５５０から、ステップ５３５で決定されたパッチおよびサブプロセス５５０で決定された予測されるパッチに基づいてトレーニング損失を計算する計算ステップ５６０に移行する。１つのＡＶＣＤ構成では、二乗誤差損失として知られるトレーニング損失が（ステップ５３５で決定された）参照画像からのパッチ内の画素と（サブプロセス５５０で計算された）対応する予測されるパッチ内の同じ位置にある画素との間の二乗差の、すべてのパッチバウンディングボックス内のすべての画素にわたる和として計算される。 Control then transfers from sub-process 550 to calculation step 560 which calculates training loss based on the patch determined in step 535 and the predicted patch determined in sub-process 550. In one AVCD configuration, the training loss, known as the squared error loss, is the same as the pixel in the patch from the reference image (determined in step 535) and the corresponding predicted patch (calculated in subprocess 550). It is calculated as the sum of the squared differences between the pixels in position over all pixels in all patch bounding boxes.

別のＡＶＣＤ構成では、バイナリクロスエントロピー損失として知られるトレーニング損失が、予測されるパッチ内の画素値と、参照画像からの対応するパッチ内の同じ位置における画素値の対数と、の積の、すべてのパッチバウンディングボックス内のすべての画素にわたる和の負数として計算される。当業者であれば、ステップ５６０において他のトレーニング損失も同様に計算できることを理解するであろう。 In another AVCD configuration, the training loss, known as the binary cross-entropy loss, is the product of all the pixel values in the predicted patch and the logarithm of the pixel values at the same position in the corresponding patch from the reference image. Computed as the negative number of the sum over all pixels in the patch bounding box of. Those skilled in the art will appreciate that other training losses can be calculated in step 560 as well.

次に、制御はステップ５６０から、ステップ５６０で決定されたトレーニング損失に基づいて再構成モデルが更新される更新ステップ５７０に移行する。１つのＡＶＣＤ構成では、再構成モデルはディープニューラルネットワークであり、モデル内の重みは反復最適化アルゴリズムの１回の反復を適用することによって更新される。反復最適化アルゴリズムの一例は、バックプロパゲーションを使用して決定された損失関数の勾配に基づく確率的勾配降下法である。反復最適化アルゴリズムの別の例は、ＡｄａＧｒａｄである。当業者は、他の反復最適化アルゴリズムの１つの反復がステップ５６０において同様に計算され得ることを認識する。 Control then transfers from step 560 to an update step 570 where the reconstruction model is updated based on the training loss determined in step 560. In one AVCD configuration, the reconstruction model is a deep neural network and the weights in the model are updated by applying one iteration of the iterative optimization algorithm. One example of an iterative optimization algorithm is the stochastic gradient descent method based on the gradient of the loss function determined using backpropagation. Another example of an iterative optimization algorithm is AdaGrad. One of ordinary skill in the art will recognize that one iteration of another iterative optimization algorithm may be similarly calculated at step 560.

次に、制御はステップ５７０から、トレーニングが収束して終了すべきかどうかを判定する判定ステップ５８０に移行する。１つのＡＶＣＤ構成では、トレーニング反復の数、すなわちステップ５５０、５６０、および５７０の反復の数が固定反復閾値を超える場合、トレーニングが収束したと判定される。固定反復閾値の一例は２５０反復である。 Control then passes from step 570 to decision step 580 which determines if the training has converged and should be terminated. In one AVCD configuration, training is determined to have converged if the number of training iterations, ie, the number of iterations of steps 550, 560, and 570, exceeds a fixed iteration threshold. An example of a fixed repeat threshold is 250 repeats.

別のＡＶＣＤ構成では、ステップ５６０で決定されたトレーニング損失が固定損失閾値を下回る場合、トレーニングは収束したと判定される。バイナリクロスエントロピー損失に対する固定損失閾値の一例は０．６である。当業者は、他の収束基準がステップ５７０において同様に適用されてもよいことを認識するであろう。 In another AVCD configuration, training is determined to have converged if the training loss determined in step 560 is below the fixed loss threshold. An example of a fixed loss threshold for binary cross entropy loss is 0.6. One of ordinary skill in the art will recognize that other convergence criteria may be applied in step 570 as well.

トレーニングが収束していないと判定された場合（ＮＯ）、次に制御はステップ５８０から予測サブプロセス５５０に移行する。トレーニングが収束したと判定された場合（ＹＥＳ）、サブプロセス４３０は終了する。 If it is determined that the training has not converged (NO), then control transfers from step 580 to the predictive subprocess 550. If it is determined that the training has converged (YES), then sub-process 430 ends.

当業者であれば、再構成モデルをトレーニングするためのサブプロセス４３０のバリエーションが同様に実施され得ることを認識するであろう。サブプロセス４３０の１つの代替実施形態では、ステップ４２０で受け取った参照画像の１つまたは複数のランダム化変換に基づいて追加の参照画像を作成することによって、トレーニングデータを増強する。ランダム化変換の例は、参照画像のランダム回転、参照画像のランダムスケーリング、参照画像の並進、参照画像に適用されるランダムガンマ補正などを含む。当業者は、他のランダム化変換が拡張されたトレーニングデータを生成するために同様に適用され得ることを認識する。 Those skilled in the art will recognize that variations of sub-process 430 for training the reconstruction model may be implemented as well. In an alternative embodiment of sub-process 430, the training data is augmented by creating additional reference images based on the one or more randomized transforms of the reference images received in step 420. Examples of randomization transforms include random rotation of the reference image, random scaling of the reference image, translation of the reference image, random gamma correction applied to the reference image, and so on. Those skilled in the art will recognize that other randomizing transforms can be similarly applied to generate the expanded training data.

サブプロセス４３０の別の代替実施形態では、ステップ５４０が異なるランダム初期化を用いて再構成モデルの複数のインスタンスを初期化する。ステップ５５０、５６０、５７０および５８０のトレーニングプロセスは、再構成モデルの複数のインスタンスのそれぞれに独立して適用される。最後に、追加の選択ステップ（図５には示されていない）は、最も低いトレーニング損失を有する再構成モデルのインスタンスを選択し、再構成モデルの他のインスタンスを破棄する。 In another alternative embodiment of sub-process 430, step 540 initializes multiple instances of the reconstruction model with different random initializations. The training process of steps 550, 560, 570 and 580 is applied independently to each of the multiple instances of the reconstruction model. Finally, an additional selection step (not shown in Figure 5) selects the instance of the reconstruction model with the lowest training loss and discards the other instances of the reconstruction model.

図７は、図５のサブプロセス４３０を使用してトレーニングされた再構成モデルに基づいてクエリ画像を再構成するサブプロセス４５０を示す。ここで、サブプロセス４５０について、図６および図８からの例示的な例を使用して説明する。サブプロセス４５０は、ハードディスクドライブ２１０に常駐し、プロセッサ２０５によってその実行が制御されるソフトウェアアプリケーションプログラム２３３の１つまたは複数のソフトウェアコードモジュールとして実装することができる。以下の説明は、方法７００の主なステップの詳細、例、および代替的な実施を提供する。 FIG. 7 illustrates a sub-process 450 for reconstructing a query image based on the reconstruction model trained using the sub-process 430 of FIG. Sub-process 450 will now be described using the illustrative examples from FIGS. 6 and 8. Sub-process 450 may be implemented as one or more software code modules of software application program 233, which resides on hard disk drive 210 and whose execution is controlled by processor 205. The following description provides details, examples, and alternative implementations of the main steps of method 700.

サブプロセス４５０は読み出しステップ７２０で開始し、クエリ画像（方法４００のステップ４４０で受信される）および再構成モデル（図５のサブプロセス４３０を使用してトレーニングされる）が、入力として受信される。 Sub-process 450 begins at read step 720, where a query image (received at step 440 of method 400) and a reconstruction model (trained using sub-process 430 of FIG. 5) are received as inputs. ..

次いで、制御はステップ７２０から、ステップ７２０で受信されたクエリ画像上のパッチバウンディングボックスのセットを決定する決定ステップ７２５に移行する。パッチバウンディングボックスのセットは、１つまたは複数のパッチバウンディングボックスを含む。１つのＡＶＣＤ構成では、パッチバウンディングボックスがクエリ画像内のすべての画素位置で決定される。パッチバウンディングボックスは、図６に示すのと同様の方法で決定される。図６において、参照画像６００の画素位置６１０におけるパッチバウンディングボックス６２０は、画素位置６１０を中心とする固定サイズの正方形境界として決定される。固定サイズの一例は１６×１６画素である。 Control then transfers from step 720 to decision step 725 which determines the set of patch bounding boxes on the query image received in step 720. The set of patch bounding boxes includes one or more patch bounding boxes. In one AVCD configuration, the patch bounding box is determined at every pixel location in the query image. The patch bounding box is determined in the same way as shown in FIG. In FIG. 6, the patch bounding box 620 at the pixel position 610 of the reference image 600 is determined as a fixed size square boundary centered at the pixel position 610. An example of a fixed size is 16x16 pixels.

次いで、制御はステップ７２５から、図５のサブプロセス４３０を使用してトレーニングされた再構成モデルに基づいてステップ７２５で決定された各パッチバウンディングボックスに対応する予測されるパッチを計算する検出サブプロセス７３０に移行する。サブプロセス７３０は、図５のサブプロセス４３０のサブプロセス５５０と実施形態を共有する。サブプロセス７３０のさらなる詳細、例、および代替の実装は、図９の方法９００を参照して後に説明される。 Control then proceeds from step 725 to a detection sub-process that computes a predicted patch corresponding to each patch bounding box determined in step 725 based on the reconstruction model trained using sub-process 430 of FIG. Move to 730. Sub-process 730 shares an embodiment with sub-process 550 of sub-process 430 of FIG. Further details, examples, and alternative implementations of sub-process 730 are described below with reference to method 900 of FIG.

次いで、制御はステップ７３０から、ステップ７３０で決定された各予測されるパッチの予測誤差を計算する計算ステップ７４０に移行する。１つのＡＶＣＤ構成では、予測誤差が、ステップ７２０で受信されたクエリ画像内のパッチとステップ７３０で決定された対応する予測されるパッチとの間の距離として計算される。１つのＡＶＣＤ構成では、計算された距離が、ステップ７２５および７３０からのパッチの対における対応する画素間の差の二乗にわたる和である。別のＡＶＣＤ構成では、計算された距離が、ステップ７２５および７３０からのパッチの対の間の正規化された相互相関である。当業者であれば、ステップ７４０において他の距離測定値を同様に計算できることを理解するであろう。 Control then transfers from step 730 to calculation step 740 which calculates a prediction error for each predicted patch determined in step 730. In one AVCD configuration, the prediction error is calculated as the distance between the patch in the query image received at step 720 and the corresponding predicted patch determined at step 730. In one AVCD configuration, the calculated distance is the squared sum of the differences between the corresponding pixels in the pair of patches from steps 725 and 730. In another AVCD configuration, the calculated distance is the normalized cross-correlation between the pair of patches from steps 725 and 730. One of ordinary skill in the art will appreciate that other distance measurements may be similarly calculated at step 740.

次いで、制御はステップ７４０から、ステップ７３０で予測されたパッチおよびステップ７４０で計算された予測誤差に基づき、各画素位置における画素値を選択することによってクエリ画像を再構成する選択ステップ７５０に移行する。図８は、再構成されたクエリ画像８００内の位置８１０における画素値を選択することを示す。１つのＡＶＣＤ構成では、画素値が単一のパッチ予測に基づいて選択される。一例では、予測されるパッチ８３０の中心画素が選択され、再構成されたクエリ画像８００内の位置８１０に格納される。 Control then transfers from step 740 to selection step 750 which reconstructs the query image by selecting pixel values at each pixel location based on the patch predicted in step 730 and the prediction error calculated in step 740. .. FIG. 8 illustrates selecting a pixel value at position 810 in reconstructed query image 800. In one AVCD configuration, pixel values are selected based on a single patch prediction. In one example, the center pixel of the predicted patch 830 is selected and stored at position 810 in the reconstructed query image 800.

別のＡＶＣＤ構成では、画素値が複数の予測に基づいて選択される。第１に、バウンディングボックス内の画素位置８１０を含むすべての予測されるパッチのセット、例えばパッチ８２０および８３０が決定される。次に、ステップ７４０で決定された最小の予測誤差を有するパッチが、このセットから選択される。最後に、選択されたパッチによって位置８１０で予測された画素値は、再構成されたクエリ画像内の位置８１０に格納される。 In another AVCD configuration, pixel values are selected based on multiple predictions. First, the set of all predicted patches, including patches 820 and 830, that include the pixel location 810 within the bounding box is determined. Next, the patch with the smallest prediction error determined in step 740 is selected from this set. Finally, the pixel value predicted at location 810 by the selected patch is stored at location 810 in the reconstructed query image.

当業者であれば、サブプロセス４５０について上述した実施形態の変形を同様に実施できることを理解するであろう。１つの代替ＡＶＣＤ構成では、画素８１０を含む複数のパッチが選択基準に基づいてステップ７５０で選択される。選択基準の一例は、ステップ７４０で決定された対応する予測誤差が固定閾値未満であることである。固定閾値の一例は０．１である。この構成では、選択されたパッチの機能が決定され、再構成されたクエリ画像内の位置８１０に格納される。機能の一例は、位置８１０での平均予測値である。機能の別の例は、位置８１０における中央値予測値である。別の代替ＡＶＣＤ構成では、ステップ７２５において、クエリ画像を固定サイズの正方形パッチの規則的なグリッドに分割することによって、重複しないパッチバウンディングボックスのセットが決定される。この構成では、ステップ７３０において、グリッド内の単一のパッチによって各画素位置が予測される。ステップ７５０では、再構成されたクエリ画像内の各位置における画素値が、対応する予測されるパッチから取得される。 Those skilled in the art will appreciate that variations of the embodiments described above for sub-process 450 may be similarly implemented. In one alternative AVCD configuration, multiple patches containing pixels 810 are selected in step 750 based on selection criteria. One example of a selection criterion is that the corresponding prediction error determined in step 740 is below a fixed threshold. An example of a fixed threshold is 0.1. With this configuration, the functionality of the selected patch is determined and stored at location 810 in the reconstructed query image. One example of a function is the average predicted value at location 810. Another example of a function is median prediction at location 810. In another alternative AVCD configuration, the set of non-overlapping patch bounding boxes is determined in step 725 by dividing the query image into a regular grid of fixed-size square patches. In this configuration, at step 730, each pixel location is predicted by a single patch in the grid. At step 750, the pixel value at each position in the reconstructed query image is obtained from the corresponding predicted patch.

サブプロセス４５０は、選択ステップ７５０を完了した後に終了する。 Sub-process 450 ends after completing selection step 750.

図９は、図５のサブプロセス４３０を使用してトレーニングされた再構成モデルに基づいてパッチを予測する方法９００を示す。方法９００は、図５に示されるサブプロセス５５０および図７に示されるサブプロセス７３０によって使用される。方法９００は、ここで、図１０からの例示的な例を使用して説明される。方法９００は、ハードディスクドライブ２１０に常駐し、プロセッサ２０５によってその実行が制御されるソフトウェアアプリケーションプログラム２３３の１つまたは複数のソフトウェアコードモジュールとして実施することができる。 FIG. 9 illustrates a method 900 for predicting patches based on a reconstructed model trained using the sub-process 430 of FIG. Method 900 is used by sub-process 550 shown in FIG. 5 and sub-process 730 shown in FIG. Method 900 will now be described using the illustrative example from FIG. Method 900 may be implemented as one or more software code modules of software application program 233, which resides on hard disk drive 210 and whose execution is controlled by processor 205.

方法９００は、読み出しステップ９２０で開始し、画像（サブプロセス４３０のための参照画像またはサブプロセス４５０のためのクエリ画像のいずれか）、図５のサブプロセス４３０を使用してトレーニングされた（または初期化された）再構成モデル、およびパッチバウンディングボックスのセット（ステップ５３５および７２５を参照）が、入力として受信される。 The method 900 begins at a read step 920 and is trained using an image (either a reference image for sub-process 430 or a query image for sub-process 450), sub-process 430 of FIG. 5 (or The reconstructed model (initialized) and the set of patch bounding boxes (see steps 535 and 725) are received as inputs.

次に、制御はステップ９２０から、受信した画像に前処理を適用する前処理ステップ９３０に移行する。１つのＡＶＣＤ構成では、受信した画像内の画素値が、強度色空間、ＣＩＥＬＡＢ色空間などの特定の色空間に変換される。別のＡＶＣＤ構成では、受信された画像がクロッピングされ、２５６×２５６画素などの固定サイズにリサイズされる。さらに別のＡＶＣＤ構成では、画像が２画素の標準偏差を有するガウスぼかしフィルタなどのノイズ低減フィルタを使用してフィルタリングされる。さらに別のＡＶＣＤ構成では、前処理ステップ９３０が画像の前処理を実行しない。 Then, control transfers from step 920 to preprocessing step 930 which applies preprocessing to the received image. In one AVCD configuration, pixel values in the received image are converted to a specific color space such as intensity color space, CIELAB color space. In another AVCD configuration, the received image is cropped and resized to a fixed size, such as 256x256 pixels. In yet another AVCD configuration, the image is filtered using a noise reduction filter, such as a Gaussian blur filter with a standard deviation of 2 pixels. In yet another AVCD configuration, preprocessing step 930 does not perform image preprocessing.

次いで、制御はステップ９３０から、ステップ９２０で受信されたパッチバウンディングボックスおよび再構成モデルとステップ９３０で決定された前処理された画像とに基づいて各パッチを囲む環状領域から特徴を抽出する抽出ステップ９４０、に移行する。各パッチを取り囲む環状領域は、図３Ａに示されるように、パッチ３１２とパッチ３１５との間の領域３１６によって定義されるようにすることができる。 Control then proceeds from step 930 to an extraction step of extracting features from the annular region surrounding each patch based on the patch bounding box and reconstruction model received at step 920 and the preprocessed image determined at step 930. 940. The annular area surrounding each patch may be defined by the area 316 between patches 312 and 315, as shown in FIG. 3A.

１つのＡＶＣＤ構成では、特徴抽出が、１つ以上の層が環形状の畳み込みフィルタを有する畳み込みディープニューラルネットワークとして実施される。図１０は、画像１０００に適用される正方形畳み込みフィルタ１０３０および環状畳み込みフィルタ１０１０の一例を示す。正方形畳み込みフィルタ１０３０の形状はバウンディングボックス１０３５によって定義され、正方形畳み込みフィルタは、バウンディングボックス１０３５内の陰影を付けた領域の非ゼロ重みのみを含む。環状の畳み込みフィルタ１０１０の形状は、外側のバウンディングボックス１００５と内側のバウンディングボックス１０２０とによって画定される。環状の畳み込みフィルタ１０１０は、外側のバウンディングボックス１００５と内側のバウンディングボックス１０２０との間の陰影を付けた領域の非ゼロ重みのみを含む。 In one AVCD configuration, feature extraction is implemented as a convolutional deep neural network with one or more layers of convolution filters that are ring-shaped. FIG. 10 shows an example of a square convolution filter 1030 and a circular convolution filter 1010 applied to the image 1000. The shape of square convolution filter 1030 is defined by bounding box 1035, which includes only non-zero weights of shaded regions within bounding box 1035. The shape of the annular convolution filter 1010 is defined by the outer bounding box 1005 and the inner bounding box 1020. The circular convolution filter 1010 includes only non-zero weights in the shaded area between the outer bounding box 1005 and the inner bounding box 1020.

特徴を抽出するために使用される畳み込みディープニューラルネットワークの一例では、入力層は単一チャネルのグレースケール画像である。次の層は、１のストライドと、正規化線形ユニット（ＲｅＬＵ）活性化と、を用いて適用される、３×３×１サイズの６４個の正方形畳み込みフィルタを有する。次の層は、１のカーネル拡張、１のストライド、およびＲｅＬＵ活性化を用いて適用される、３×３×６４サイズの６４個の正方形畳み込みフィルタを有する。次の層は、２×２サイズのプーリングカーネルと１のストライドを用いて最大プーリングを適用する。次の層は、２のカーネル拡張、１のストライド、およびＲｅＬＵ活性化を用いて適用される、３×３×６４サイズの１２８個の正方形畳み込みフィルタを有する。次の層は、２のカーネル拡張および１のストライドを用いて適用される、２×２サイズのプーリングカーネルで最大プーリングを適用する。次の層は、４のカーネル拡張、１のストライド、およびＲｅＬＵ活性化で適用される、３×３×１２８サイズの１９２個の正方形畳み込みフィルタを有する。次の層は、４のカーネル拡張および１のストライドを用いて適用される、２×２サイズのプーリングカーネルを用いて最大プーリングを適用する。次の層は、８のカーネル拡張、１のストライド、およびＲｅＬＵ活性化を用いて適用される、３×３×１９２サイズの３８４個の正方形畳み込みフィルタを有する。次の層は、８のカーネル拡張および１のストライドを用いて適用される、２×２サイズのプーリングカーネルを用いて最大プーリングを適用する。次の層は、１６のカーネル拡張、１のストライド、およびＲｅＬＵ活性化を用いて適用された、７×７の外側サイズ、３×３の内側サイズを有する７６８個の環形状の畳み込みフィルタを有する。方法９００のステップ９４０で抽出された特徴は、環状フィルタのアクティベーションである。すべての層における畳み込みフィルタは、ステップ９２０で受信された画像内のすべての位置について特徴が決定されるように、ゼロパディングを用いて適用される。 In one example of a convolutional deep neural network used to extract features, the input layer is a single-channel grayscale image. The next layer has 64 square convolution filters of size 3x3x1 applied with a stride of 1 and normalized linear unit (ReLU) activation. The next layer has 64 square convolution filters of size 3x3x64 applied with 1 kernel extension, 1 stride, and ReLU activation. The next layer applies max pooling with a 2x2 size pooling kernel and a stride of 1. The next layer has 128 square convolution filters of size 3x3x64 applied with a kernel extension of 2, a stride of 1, and ReLU activation. The next layer applies maximum pooling with a pooling kernel of size 2x2, which is applied with a kernel extension of 2 and a stride of 1. The next layer has 192 square convolution filters of size 3x3x128 applied with 4 kernel extensions, 1 stride, and ReLU activation. The next layer applies max pooling with a pooling kernel of size 2x2, which is applied with a kernel extension of 4 and a stride of 1. The next layer has 384 square convolution filters of size 3x3x192 applied with 8 kernel extensions, 1 stride, and ReLU activation. The next layer applies max pooling with a 2x2 size pooling kernel, which is applied with a kernel extension of 8 and a stride of 1. The next layer has 768 ring-shaped convolution filters with an outer size of 7×7 and an inner size of 3×3, applied with 16 kernel extensions, 1 stride, and ReLU activation. .. The feature extracted in step 940 of method 900 is the activation of the annular filter. The convolution filter at all layers is applied with zero padding so that the features are determined for all positions in the image received in step 920.

当業者であれば、上述した畳み込みディープニューラルネットワークの実施形態の変形も同様に実施できることを理解するであろう。一代替実施形態では、バッチ正規化が各畳み込み層間で適用される。別の代替実施形態では、ＲｅＬＵアクティベーションが異なるアクティベーション機能で置き換えられる。活性化関数の一例は、線形アクティベーションである。活性化関数の別の例は、Ｓｉｇｍｏｉｄアクティベーションである。別の代替実施形態では、最大プーリング層が異なるプーリング機能で置き換えられる。プーリング機能の一例は、平均プーリングである。別の代替実施形態では、環状フィルタが外側バウンディングボックスによって画定され、フィルタの中心で徐々にゼロに減少する重みを有する。一例では、フィルタ重みは、学習された重みと、バウンディングボックスを中心とする２次元逆ガウス関数と、の積である。逆ガウス関数は、１から固定標準偏差を有するガウス関数を引いたものとして定義される。固定標準偏差の一例は、外側バウンディングボックスの幅の１０分の１である。 Those skilled in the art will appreciate that variations of the embodiments of the convolutional deep neural network described above can be implemented as well. In an alternative embodiment, batch normalization is applied between each convolutional layer. In another alternative embodiment, ReLU activation is replaced with a different activation function. One example of an activation function is linear activation. Another example of an activation function is Sigmoid activation. In another alternative embodiment, the maximum pooling layer is replaced with a different pooling function. One example of a pooling function is average pooling. In another alternative embodiment, an annular filter is defined by the outer bounding box and has a weight that gradually decreases to zero at the center of the filter. In one example, the filter weights are the product of the learned weights and the two-dimensional inverse Gaussian function centered on the bounding box. The inverse Gaussian function is defined as 1 minus the Gaussian function with a fixed standard deviation. An example of a fixed standard deviation is one tenth of the width of the outer bounding box.

次いで、制御はステップ９４０から、ステップ９４０で抽出された特徴に基づいて、ステップ９２０で受信された各パッチバウンディングボックスに対応する予測される画像パッチを計算する予測ステップ９５０に移行する。１つのＡＶＣＤ構成では、再構成モデルが畳み込みディープニューラルネットワークであり、予測ステップは特徴抽出ネットワークの出力に適用されるネットワーク層を使用して実施される。一例では、予測ネットワークが１×１×７６８サイズの２５６個の畳み込みフィルタの層を備える。畳み込みフィルタの出力は、有界活性化関数を用いて有界間隔にマッピングされる。有界活性化関数の一例は、出力を範囲［０，１］に制限するシグモイドアクティベーションである。最後に、予測されるパッチは、ステップ９２０で受信したパッチのサイズに従って出力アクティベーションを再整形することによって形成される。一例では、特定の位置での２５６個の出力アクティベーションがその位置での予測されるパッチに対応する１６×１６パッチに再成形される。当業者であれば、ステップ９５０において、他の固定フィルタサイズおよび活性化関数が、畳み込みディープニューラルネットワークにおいて同様に使用され得ることを認識するであろう。 Control then transfers from step 940 to a prediction step 950 which calculates a predicted image patch corresponding to each patch bounding box received in step 920 based on the features extracted in step 940. In one AVCD configuration, the reconstruction model is a convolutional deep neural network and the prediction step is performed using a network layer applied to the output of the feature extraction network. In one example, the prediction network comprises 256 layers of convolutional filters of size 1×1×768. The output of the convolutional filter is mapped into a bounded interval using a bounded activation function. One example of a bounded activation function is sigmoid activation that limits the output to the range [0,1]. Finally, the predicted patch is formed by reshaping the output activation according to the size of the patch received in step 920. In one example, the 256 output activations at a particular location are reshaped into a 16x16 patch corresponding to the expected patch at that location. Those skilled in the art will recognize that other fixed filter sizes and activation functions may be used in the convolutional deep neural network as well, at step 950.

方法９００は、予測ステップ９５０を完了した後に終了する。 Method 900 ends after completing prediction step 950.

サブプロセス４３０および４５０、ならびに方法９００の上記の説明は、畳み込みディープニューラルネットワーク再構成モデルに基づく例を実施形態に提供する。当業者は、サブプロセス４３０および４５０ならびに方法９００の代替実施形態において、他の機械学習モデルが再構成モデルとして同様に使用され得ることを認識するであろう。１つの代替ＡＶＣＤ構成では、再構成モデルが固定数のアトムを有する辞書を使用する辞書学習に基づく。固定数の一例は１０２４個のアトムである。サブプロセス４３０のステップ５４０の一実施形態では、辞書アトムはランダムに初期化される。サブプロセス４３０のステップ５４０の別の実施形態では、辞書アトムはＫ特異値分解（Ｋ−ＳＶＤ）を適用することによって初期化される。サブプロセス４３０のステップ５７０の一実施形態では、辞書が交互最小化アルゴリズム（alternating minimization algorithm）の１回の反復を適用することによって更新される。方法９００のステップ９４０の一実施形態では、ステップ９２０で受信したパッチバウンディングボックス内の画素値をベクトル化することによって、固定次元の特徴が計算される。固定次元の一例は２５６であり、１６×１６画素バウンディングボックス内のグレースケール画素値のベクトル化に対応する。方法９００のステップ９５０の一実施形態では、ステップ９４０で計算された特徴およびステップ９２０で受信された辞書アトムに基づいて辞書符号化を計算し、次いで、符号化係数と対応する辞書アトムとの積にわたる和を計算することによってパッチを再構成することによって、パッチが予測される。一例では、辞書符号化が最小角度回帰法（least angle regression method）を使用して決定される。 The above description of sub-processes 430 and 450, and method 900 provide examples based on a convolutional deep neural network reconstruction model to embodiments. Those skilled in the art will recognize that in alternative embodiments of sub-processes 430 and 450 and method 900, other machine learning models may be used as the reconstruction model as well. In one alternative AVCD configuration, the reconstruction model is based on dictionary learning using a dictionary with a fixed number of atoms. An example of a fixed number is 1024 atoms. In one embodiment of step 540 of sub-process 430, the dictionary atoms are randomly initialized. In another embodiment of step 540 of sub-process 430, the dictionary atom is initialized by applying K-singular value decomposition (K-SVD). In one embodiment of step 570 of sub-process 430, the dictionary is updated by applying one iteration of the alternating minimization algorithm. In one embodiment of step 940 of method 900, fixed dimensional features are calculated by vectorizing the pixel values in the patch bounding box received in step 920. An example of a fixed dimension is 256, which corresponds to vectorization of grayscale pixel values within a 16x16 pixel bounding box. In one embodiment of step 950 of method 900, a dictionary encoding is calculated based on the features calculated in step 940 and the dictionary atom received in step 920, and then the product of the coding coefficient and the corresponding dictionary atom. The patch is predicted by reconstructing the patch by computing the sum over. In one example, the dictionary encoding is determined using the least angle regression method.

Claims

A method of detecting a scene change between images captured at different times by a moving camera, comprising:
Generating an image corresponding to the query image captured by the mobile camera using a reconstruction model based on a reference image captured by the mobile camera;
Detecting changes in the scene by comparing the query image and the generated image;
A method comprising:

Furthermore,
Training the reconstruction model from the reference image, the step comprising:
Determining one or more patch bounding boxes in the reference image;
Calculating a predictive patch for the one or more patch bounding boxes in the reference image using the reconstruction model;
Calculating a training loss based on the one or more patch bounding boxes of the reference image and corresponding prediction patches;
Updating a reconstruction model based on the training loss.

The method according to claim 2, wherein the calculation of the prediction patch, the calculation of the training loss, and the update of the reconstruction model are repeated until the training converges.

The method of claim 2, wherein in training the reconstruction model, the one or more patch bounding boxes of the reference image are determined based on keypoints detected in the reference image.

Generating an image corresponding to the query image,
Determining one or more patch bounding boxes in the query image;
Calculating a predicted patch for each of the one or more patch bounding boxes of the query image using the reconstruction model;
Calculating a prediction error for the prediction patch based on the one or more patch bounding boxes and corresponding prediction patches of the query image;
Reconstructing the query image by selecting a predictive pixel value at each pixel location based on the predictive patch and the corresponding predictive error.

The predicted pixel value is
The method according to claim 5, wherein the prediction patch having the smallest prediction error is predicted.

The calculation of the prediction patch is
The method of claim 2, wherein features are extracted from an annular region surrounding each of the one or more patch bounding boxes of the reference image and calculated based on the extracted features.

Furthermore,
8. The method of claim 7, comprising pre-processing the reference image before extracting the features so that the features are extracted from the annular region.

The calculation of the prediction patch is
The method of claim 5, wherein the method is calculated based on features extracted from an annular region surrounding each of the one or more patch bounding boxes of the query image.

Furthermore,
The method of claim 9, comprising pre-processing the query image before extracting the features so that the features are extracted from the annular region.

The method of claim 1, wherein the reference image is a plurality of images of the scene captured from different viewpoints.

The method of claim 1, wherein the reconstruction model comprises an inverse Gaussian filter.

A program for causing a computer to execute the method according to any one of claims 1 to 12.

A computer-readable storage medium storing the program according to claim 13.