JP2019149112A

JP2019149112A - Composition device, method, and program

Info

Publication number: JP2019149112A
Application number: JP2018034847A
Authority: JP
Inventors: 敬介野中; Keisuke Nonaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-02-28
Filing date: 2018-02-28
Publication date: 2019-09-05
Anticipated expiration: 2038-02-28
Also published as: JP6898264B2

Abstract

To provide a composition device of free viewpoint images capable of performing high-speed processing and coping with occlusion problems.SOLUTION: A composition device 10 includes: a calculation unit 3 that finds likelihood images of an area of a photographed subject from viewpoint images of a multi-viewpoint image; a backprojection unit 6 that obtains backprojection data obtained by back projecting the likelihood images onto a plurality of respective backprojection planes arranged in a three-dimensional space; and a drawing unit 7 that combines free viewpoint images of the subject by rendering the subject by drawing and back projecting texture of all or part of viewpoint images of the multi-viewpoint image for the backprojection data to the plurality of backprojection planes constituting the backprojection data in order.SELECTED DRAWING: Figure 2

Description

本発明は、高速な処理が可能であり且つオクルージョン問題に対処できる自由視点画像の合成装置、方法及びプログラムに関する。 The present invention relates to a free viewpoint image composition apparatus, method, and program capable of high-speed processing and capable of coping with the occlusion problem.

従来、スポーツシーンなどを対象として、カメラで撮影されていない自由な視点からの映像（以下、自由視点映像）を生成する技術が提案されている。この技術は複数のカメラで撮影された映像を基に、それらの配置されていない仮想的な視点の映像を合成し、その結果を画面上に表示することでさまざまな視点での映像観賞を可能とするものである。 2. Description of the Related Art Conventionally, there has been proposed a technique for generating a video from a free viewpoint that is not captured by a camera (hereinafter referred to as a free viewpoint video) for a sports scene or the like. This technology enables video viewing from various viewpoints by synthesizing videos from virtual viewpoints that are not arranged based on videos taken by multiple cameras and displaying the results on the screen. It is what.

ここで、自由視点映像合成技術のうち、視体積交差法と呼ばれる原理を利用して、被写体の3DCGモデルを生成することで高品質な自由視点映像を合成する既存技術が存在する。このフルモデル方式では、複数のカメラから得られる被写体の概形情報を3次元空間に逆投影し、それらを膨大な数の点群データに記述し、被写体の概形を精緻に再現するものである（手法によってはマーチングキューブと呼ばれる手法でポリゴンデータ化することもあるが、膨大な点群データを中間的に使用する点は変わらない）。あらかじめ生成された被写体の3DCGモデルを入力として、仮想視点の位置を決めてディスプレイ上にレンダリングすることで、自由視点映像が生成される。 Here, among free viewpoint video synthesis technologies, there is an existing technology that synthesizes a high-quality free viewpoint video by generating a 3DCG model of a subject using a principle called a view volume intersection method. In this full model method, the outline information of the subject obtained from multiple cameras is back-projected into a three-dimensional space and described in a large number of point cloud data to accurately reproduce the outline of the subject. Yes (Depending on the technique, polygon data may be created by a technique called marching cubes, but the point of using enormous point cloud data in the middle is unchanged). Using the 3DCG model of the subject generated in advance, the position of the virtual viewpoint is determined and rendered on the display, thereby generating a free viewpoint video.

特願2017-167472号Japanese Patent Application No. 2017-167472

当該フルモデル方式に対して、本出願人は点群データを介さずに仮想的な平面群を用いて視体積交差法を実現する技術を提案している（特許文献１）。この技術では膨大な数のデータへのアクセスが不要となり、また「被写体モデル生成から合成映像表示までを一度に行うこと（中間データを吐き出さず合成すること）が可能である」ため、点群データを介する手法に比べて格段に高速に自由視点映像合成を行えるといったメリットがある。また、当該特許文献では、ユーザの選択した仮想的な視点の座標に応じて、仮想平面群の密度や座標を適応的に変更することで、実際の計算機におけるGPU（Graphics Processing Unit）のもつメモリ領域サイズに適した映像合成となる方法も提案されている。 In contrast to the full model method, the present applicant has proposed a technique for realizing a visual volume intersection method using a virtual plane group without using point cloud data (Patent Document 1). This technology eliminates the need to access an enormous amount of data, and "can perform from the subject model generation to composite video display at once (compositing without discharging intermediate data)", so point cloud data There is an advantage that free viewpoint video composition can be performed at a much higher speed than the method using the. Also, in this patent document, the memory of a GPU (Graphics Processing Unit) in an actual computer is adaptively changed by changing the density and coordinates of a virtual plane group according to the coordinates of a virtual viewpoint selected by the user. A method for image synthesis suitable for the region size has also been proposed.

しかしながら、特許文献１の手法にも改良の余地があった。具体的に、特許文献１では、複数のカメラからみた被写体の深度を明示的に計算していないため、カメラ画像内の被写体の前後関係を考慮することができず、結果として不自然な映像合成となる場合があった。 However, there is room for improvement in the method of Patent Document 1. Specifically, in Patent Document 1, since the depth of the subject viewed from a plurality of cameras is not explicitly calculated, the context of the subject in the camera image cannot be considered, resulting in an unnatural video composition. There was a case.

より具体的には、図１に模式例を示すように、例えばあるカメラCAから見て、xyz座標系で示されている世界座標系においてxy平面上に２つの対象（オブジェクト）（例えば、フィールド上のスポーツ選手といったような２人の人物等が想定されるが、ここでは模式例として２つの「円柱」で例示している）が前後に並んでおり一方（灰色の円柱CLB）がもう一方（白色の円柱CLF）によって遮蔽されているシーンを考える（このような対象同士のカメラ画像内の遮蔽を、以後オクルージョンと呼ぶ）。なお、図示するようにカメラCA（及び仮想視点に対応するカメラCV）から見ると+x方向が手前側、-x方向が後方側となる。この場合、カメラCAで撮影された画像（テクスチャ）PAは、手前側の遮蔽されていない対象（白色の円柱CLF）に貼り付けられる（マッピングされる）べきであり、オクルージョンによって見えない後方の対象（灰色の円柱CLB）にはマッピングされるべきではない。しかしながら、前述の通り特許文献１においては、（計算の高速化を実現するために）被写体の前後関係を計算していないため、オクルージョンの有無にかかわらずカメラ画像をすべての被写体に貼り付けてしまうことから、不自然な合成映像となる場合があった。 More specifically, as shown in a schematic example in FIG. 1, for example, when viewed from a certain camera CA, two objects (objects) (for example, fields) on the xy plane in the world coordinate system indicated by the xyz coordinate system. Two people such as the above athletes are assumed, but here, as a schematic example, two “cylinders” are shown side by side, and one (gray cylinder CLB) is the other Consider a scene that is occluded by (white cylinder CLF) (the occlusion in the camera image of such objects is hereinafter referred to as occlusion). As shown in the figure, when viewed from the camera CA (and the camera CV corresponding to the virtual viewpoint), the + x direction is the front side and the -x direction is the rear side. In this case, the image (texture) PA taken by the camera CA should be pasted (mapped) to the unshielded object (white cylinder CLF) on the near side, and the object behind it that cannot be seen by occlusion It should not be mapped to (gray cylinder CLB). However, as described above, Patent Document 1 does not calculate the subject's front-rear relationship (in order to achieve high-speed calculation), so that the camera image is pasted to all subjects regardless of the presence or absence of occlusion. For this reason, there were cases in which an unnatural composite image was obtained.

すなわち、合成映像を得るために用いる仮想視点が図１にてカメラCV（従って、カメラCVは実写映像を得るためのものではない）のような位置にあったとするとき、仮想視点に対応するカメラCVにおける合成映像は、当該位置に実際のカメラがあったとする場合に得られるべきものとして、画像PVのように後方の対象である灰色の円柱CLBのみが撮影された状態となっており、当該灰色の円柱CLBが白色の円柱CLFで遮蔽された状態ではないことが望まれるものである。しかしながら、当該合成映像を生成するための実写映像としてカメラCAの画像を用いたとすると、画像PAのように手前の白色の円柱CLFによって灰色の円柱CLBに対してオクルージョンが生じた状態（を仮想視点PVから見ていることに相当する状態）の画像PVAが合成され、本来合成されるべき画像PVが合成されないということがあった。 That is, when the virtual viewpoint used to obtain the composite image is at a position such as the camera CV in FIG. 1 (and therefore the camera CV is not for obtaining a live-action image), the camera corresponding to the virtual viewpoint. Assuming that there is an actual camera at the position, the composite video in CV is the state where only the gray cylinder CLB that is the object behind is taken like the image PV, It is desired that the gray cylinder CLB is not shielded by the white cylinder CLF. However, if an image of the camera CA is used as a live-action image for generating the composite image, a state in which occlusion occurs in the gray cylinder CLB by the white cylinder CLF in the foreground (as shown in the image PA) The image PVA in a state equivalent to that seen from the PV is synthesized, and the image PV that should originally be synthesized is not synthesized.

本発明は上記従来技術の課題に鑑み、特許文献１の枠組みに即した高速な処理によりオクルージョン問題にも対処することが可能な合成装置、方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems of the prior art, and an object of the present invention is to provide a synthesis apparatus, method, and program capable of dealing with the occlusion problem by high-speed processing in accordance with the framework of Patent Document 1.

上記目的を達成するため、本発明は合成装置であって、多視点画像の各視点画像から撮影されている対象の領域の尤度画像を求める算出部と、前記尤度画像の各々を３次元空間内に配置された複数の逆投影平面へと逆投影した逆投影データを得る逆投影部と、前記逆投影データに対して前記多視点画像の全部又は一部の視点画像のテクスチャを、当該逆投影データを構成する前記複数の逆投影平面へと順番に逆投影して描画して前記対象をレンダリングすることで、前記対象の自由視点画像を合成する描画部と、を備えることを特徴とする。また、当該装置に対応する方法及びプログラムであることを特徴とする。 In order to achieve the above object, the present invention is a synthesis device, which calculates a likelihood image of a target area captured from each viewpoint image of a multi-viewpoint image, and three-dimensionally each of the likelihood images. A backprojection unit that obtains backprojection data backprojected to a plurality of backprojection planes arranged in space, and textures of all or part of the viewpoint images of the multi-viewpoint image with respect to the backprojection data, A rendering unit that synthesizes a free viewpoint image of the object by rendering the object by rendering the object by back projecting sequentially onto the plurality of back projection planes constituting backprojection data; To do. Further, the present invention is characterized by being a method and a program corresponding to the apparatus.

本発明によれば、多視点画像の全部又は一部の視点画像のテクスチャを逆投影平面へと順番に逆投影して描画することにより、高速な処理によりオクルージョンに対処して自由視点画像を合成することが可能となる。 According to the present invention, the textures of all or some of the viewpoint images of the multi-viewpoint image are backprojected in order onto the backprojection plane, and are drawn in order to synthesize the free viewpoint image with high-speed processing. It becomes possible to do.

特許文献１の手法による合成映像に関して、合成の際に用いる映像及び対象の位置関係の態様によっては改良の余地があることの模式例を示す図である。It is a figure which shows the model example of having room for improvement regarding the synthetic | combination image | video by the method of patent document 1 depending on the image | video used in the case of composition | combination, and the aspect of the positional relationship of object. 一実施形態に係る合成装置の機能ブロック図である。It is a functional block diagram of the synthesizing | combining apparatus which concerns on one Embodiment. カメラ校正によってカメラ画像の座標と世界座標系の座標との対応付けが可能となることを示す模式図である。It is a schematic diagram which shows that matching of the coordinate of a camera image and the coordinate of a world coordinate system is attained by camera calibration. 抽出部により適用される背景差分法の模式例を図である。It is a figure which shows the schematic example of the background difference method applied by the extraction part. 面群設定部及び順設定部でそれぞれ設定する面群及び順番の一実施形態に係る模式例を示す図である。It is a figure which shows the example of a model which concerns on one Embodiment of the surface group and order which are respectively set by a surface group setting part and an order setting part. 逆投影部の処理を模式的に示す図である。It is a figure which shows the process of a back projection part typically. 一実施形態に係る描画部、再投影部及び付与部の動作のフローチャートである。It is a flowchart of operation | movement of the drawing part which concerns on one Embodiment, a reprojection part, and the provision part. 再投影部が再投影領域を得る処理の模式例を示す図であるIt is a figure which shows the model example of the process in which a reprojection part acquires a reprojection area | region. 描画部による描画の模式例を示す図である。It is a figure which shows the model example of the drawing by a drawing part.

図２は、一実施形態に係る合成装置の機能ブロック図である。図示する通り、合成装置10は、校正部1、抽出部2、算出部3、面群設定部4、順設定部5、逆投影部6、描画部7、再投影部8及び付与部9を備える。図示する通り、このうち逆投影部6、描画部7、再投影部8及び付与部9はレンダリング部20を構成している。合成装置10はその全体的な動作として、多視点映像としての複数のカメラ映像V_c,t(u,v)を入力として受け取り、当該カメラ映像V_c,t(u,v)に対してユーザ入力等によって指定される仮想視点CV（すなわち、自由視点CV）における合成映像SY_t(u,v)を出力する。 FIG. 2 is a functional block diagram of a synthesis device according to an embodiment. As illustrated, the synthesis device 10 includes a calibration unit 1, an extraction unit 2, a calculation unit 3, a surface group setting unit 4, a sequence setting unit 5, a back projection unit 6, a drawing unit 7, a reprojection unit 8, and a grant unit 9. Prepare. As illustrated, the back projection unit 6, the drawing unit 7, the reprojection unit 8, and the assigning unit 9 constitute a rendering unit 20. As an overall operation, the synthesizer 10 receives a plurality of camera images V _{c, t} (u, v) as multi-viewpoint images as input, and the user receives the camera images V _{c, t} (u, v). The composite video SY _t (u, v) at the virtual viewpoint CV designated by the input or the like (that is, the free viewpoint CV) is output.

なお、特許文献１に示される構成に対する主要な追加構成（ないし追加処理を行う構成）として、図２の合成装置10は順設定部5、描画部7、再投影部8及び付与部9を備えるものである。当該追加構成の機能部による追加処理と、当該追加処理に連携したその他の各機能部での処理と、のそれぞれが、本発明においてオクルージョンに配慮した合成映像SY_t(u,v)を得ることを可能にするのに寄与している。 As a main additional configuration (or configuration for performing additional processing) with respect to the configuration shown in Patent Document 1, the composition device 10 in FIG. 2 includes a sequence setting unit 5, a drawing unit 7, a reprojection unit 8, and a grant unit 9. Is. Each of the additional processing by the functional unit of the additional configuration and the processing in each of the other functional units linked to the additional processing obtains a composite video SY _t (u, v) in consideration of occlusion in the present invention. Contribute to making this possible.

ここで、本発明の説明に用いる映像データ等に関する表記の説明を行う。入力される多視点映像「V_c,t(u,v)」とは、複数N台のカメラc（c=1,2,…,N）の時刻t（t=1,2,3,…）の画素位置(u,v)（u,vは整数）における画素値を表すものとして、当該映像Vを変数（且つインデクス）c,t,u,vの関数として表記するものである。同様に、出力される合成映像「SY_t(u,v)」も、当該時刻tの画素位置(u,v)における画素値を表すものとして、当該合成映像SYを関数として表記するものである。以下の説明に現れる各データもこれと同様に、大文字部分がデータ関数名を表し、これに続く下付き小文字部分がカメラc、時刻t、仮想平面k（後述）を区別するインデクスであり、さらにこれに続く(u,v)や(i,j)が位置を区別するインデクスである。当該インデクスのうちの一部が存在しない場合もある。 Here, notations relating to video data and the like used in the description of the present invention will be described. The input multi-view video “V _{c, t} (u, v)” means time t (t = 1, 2, 3,...) Of a plurality of N cameras c (c = 1, 2,..., N). The video V is expressed as a function of variables (and indexes) c, t, u, v as representing the pixel value at the pixel position (u, v) (where u, v are integers). Similarly, the synthesized video “SY _t (u, v)” to be output also represents the synthesized video SY as a function, representing the pixel value at the pixel position (u, v) at the time t. . Similarly, each data appearing in the following description is also an index that distinguishes camera c, time t, and virtual plane k (described later), with the uppercase part representing the data function name, followed by the lowercase part. Subsequent (u, v) and (i, j) are indexes for distinguishing positions. Some of the indexes may not exist.

合成装置10は映像上の各時刻t=1,2,3,…の多視点映像V_c,t(u,v)（時刻tを固定した場合は多視点画像V_c,t(u,v)）につき合成処理を行うことで合成映像SY_t(u,v)（時刻tを固定した場合は合成画像SY_t(u,v)）を出力するが、当該合成処理は任意の時刻tについて共通である。従って、以下の説明においてはこのような入力映像V_c,t(u,v)における任意の時刻tにおける処理であるものとして、場合によっては特に時刻tに言及することなく、説明を行う。図１の合成装置10の各部の処理の概要は以下の通りである。 The synthesizing device 10 generates a multi-viewpoint video V _{c, t} (u, v) at each time t = 1, 2, 3,... On the video (if the time t is fixed, the multi-viewpoint image V _{c, t} (u, v )), A composite image SY _t (u, v) (or a composite image SY _t (u, v) when time t is fixed) is output, but the composition processing is performed at any time t. It is common. Therefore, in the following description, it is assumed that the process is an arbitrary time t in the input video V _{c, t} (u, v), and in some cases, the time t is not particularly mentioned. The outline of the processing of each part of the synthesizing apparatus 10 in FIG. 1 is as follows.

まず、合成装置10への入力としての多視点画像V_c,t(u,v)は、校正部1、抽出部2及び描画部7へと入力される。ここで、当該入力される多視点画像V_c,t(u,v)の各カメラcにおける画像には、異なるカメラ画像間での時刻同期が行われたうえで、（すなわち、時刻tは異なるカメラ画像間で当該同期された共通時刻であるものとして、）同一の対象が当該カメラcの撮影位置から撮影されているものとする。 First, the multi-viewpoint image V _{c, t} (u, v) as an input to the synthesizing device 10 is input to the calibration unit 1, the extraction unit 2, and the drawing unit 7. Here, the images of the input multi-viewpoint images V _{c, t} (u, v) in each camera c are subjected to time synchronization between different camera images (that is, the time t is different). It is assumed that the same object is taken from the shooting position of the camera c (assuming that it is the synchronized common time between the camera images).

＜校正部1＞
校正部1は、いわゆるカメラ校正を行うものであり、多視点画像（における各カメラcの画像）V_c,t(u,v)を入力として、カメラcごとに実空間の地面（フィールド）の座標(x,y,z)とカメラ画像V_c,t(u,v)との対応付けを取り、得られたキャリブレーション（校正）データ（すなわち、カメラパラメータ）のうちの外部パラメータを逆投影部6及び描画部7へと出力する。当該各部2,6,7は当該得られた校正データを用いることでそれぞれの処理が可能となる。（なお、周知のように、キャリブレーションではレンズ歪を解消する内部パラメータも得ることができるが、合成装置10の各部では当該内部パラメータを用いた歪補正済みのデータを利用してよい。内部パラメータに関するデータ授受の流れは図２では省略している。） <Calibration section 1>
The calibration unit 1 performs so-called camera calibration. The multi-viewpoint image (the image of each camera c) V _{c, t} (u, v) is used as an input, and the ground (field) of the real space for each camera c. Associating coordinates (x, y, z) with camera image V _{c, t} (u, v) and backprojecting external parameters of the obtained calibration data (ie, camera parameters) Output to the unit 6 and the drawing unit 7. Each of the parts 2, 6, and 7 can be processed by using the obtained calibration data. (Note that, as is well known, an internal parameter for eliminating lens distortion can be obtained by calibration, but each part of the synthesizing apparatus 10 may use distortion-corrected data using the internal parameter. (The flow of data exchange concerning is omitted in FIG. 2.)

なお、校正部1による当該キャリブレーション操作は固定カメラを前提とした場合であれば、各カメラcにおいてある時刻tに一度操作を行うのみよく、その後の時刻t+1,t+2,…においては既に時刻tで得られている校正データを利用するようにすればよい。また、多視点画像V_c,t(u,v)に既に校正データが紐づけて与えられている場合には、校正部1は省略されてもよい。（この場合は、校正部1が合成装置10の外部構成として存在しているものとみなすことができる。） If the calibration operation by the calibration unit 1 is based on a fixed camera, it is only necessary to perform the operation once at a certain time t in each camera c, and at subsequent times t + 1, t + 2,. May use calibration data already obtained at time t. Further, when the calibration data is already given to the multi-viewpoint image V _{c, t} (u, v), the calibration unit 1 may be omitted. (In this case, it can be considered that the calibration unit 1 exists as an external configuration of the synthesis apparatus 10).

図３に、当該カメラ校正によりカメラcの画像V_c,t(u,v)の座標(u,v)_cと世界座標系の点(x,y,z)との対応付けが可能となることの模式例を示す。カメラcのカメラ中心と座標(u,v)_cとを通る直線L(u,v)上に点(x,y,z)が存在するという形で当該対応付けが可能となっている。なお、当該カメラ校正による世界座標系(x,y,z)は全てのカメラcにおいて共通のものである。 3, the image V _c of the camera _{c, t} (u, v) coordinates (u, v) of _c and the world coordinate system a point (x, y, z) is correspondence between made possible by the camera calibration An example of this will be shown. The association is possible in the form that a point (x, y, z) exists on a straight line L (u, v) passing through the camera center of the camera c and the coordinates (u, v) _c . The world coordinate system (x, y, z) by the camera calibration is common to all cameras c.

校正部1におけるカメラ校正に関しては、任意の既存手法を用いてよく、自動及び／又は人手により、例えば特徴点や線分を検出可能なマーカを世界座標(x,y,z)の既知の位置に配置したうえで、カメラcの座標(u,v)_cにおける特徴点座標や線分関連座標との対応関係を求め、カメラパラメータを取得するようにするようにすればよい。 For the camera calibration in the calibration unit 1, any existing method may be used. A marker that can detect, for example, a feature point or a line segment automatically and / or manually is a known position in the world coordinates (x, y, z). Then, the correspondence between the feature point coordinates and the line segment related coordinates in the coordinates (u, v) _c of the camera c is obtained, and the camera parameters may be acquired.

＜抽出部2＞
抽出部2では、各カメラcの画像V_c,t(u,v)に対して、既存手法である背景差分法を用いて画像内の背景と前景を分類し、当該分類を表現する2値又は前景の尤度（グレースケール階調等で与えればよい）を画素値とするマスク画像M_c,t(u,v)を得て、算出部3へと出力する。ここで、当該分類される前景は画像V_c,t(u,v)において撮影されている対象（例えば、人物等）の領域となる。 <Extractor 2>
The extraction unit 2 classifies the background and foreground in the image using the background difference method that is an existing method for the image V _{c, t} (u, v) of each camera c, and expresses the classification. Alternatively, a mask image M _{c, t} (u, v) having the pixel value as the likelihood of the foreground (which may be given in grayscale gradation or the like) is obtained and output to the calculation unit 3. Here, the classified foreground is an area of an object (for example, a person) photographed in the image V _{c, t} (u, v).

図４は、抽出部2での抽出処理の模式例を示す図であり、対象（オブジェクト）としての人物を含むある時刻tの原画像[1]に対し、背景差分法を適用することによって、背景差分結果としての画像[2]が生成されている。この画像[2]は、対象である人物に相当する前景画素部分が白色（画素値が1）であってその他の背景画素部分は黒色（画素値が0）であるマスク画像となっている。 FIG. 4 is a diagram illustrating a schematic example of extraction processing in the extraction unit 2, and by applying the background difference method to an original image [1] at a certain time t including a person as a target (object), An image [2] as a background difference result is generated. This image [2] is a mask image in which the foreground pixel portion corresponding to the subject person is white (pixel value is 1) and the other background pixel portions are black (pixel value is 0).

なお、図４では、このマスク画像[2]と原画像[1]とから、人物（対象）のテクスチャ情報を含み背景画素部分は黒色のままであるオブジェクトのテクスチャ画像[3]も取得可能となることが示されている。このように、画像背景とその前景との分離を行うことによって、人物等の対象（の画像情報）を大まかに抽出することが可能となる。 In FIG. 4, the texture image [3] of the object including the texture information of the person (target) and the background pixel portion being black can be obtained from the mask image [2] and the original image [1]. It has been shown to be. In this way, by separating the image background and its foreground, it becomes possible to roughly extract the object (image information) such as a person.

ここで、抽出部2において背景差分法を適用するに際しては、各カメラcの画像V_c,t(u,v)についての背景画像BG_c,t(u,v)を予め与えておくものとする。なお、カメラcが固定されており光源条件などが不変であるならば、背景画像は静止画でよい。また、抽出部2においてマスク画像M_c,t(u,v)を2値マップ（前景／背景の区別のみ）ではなく前景に関する尤度マップとして求める場合は、任意種類の既存の対象（物体など）尤度の算出手法を用いてよく、例えば顕著性(Saliency)マップ等として求めるようにすればよい。 Here, when applying the background subtraction method in the extraction unit 2, the background image BG _{c, t} (u, v) for the image V _{c, t} (u, v) of each camera c is given in advance. To do. If the camera c is fixed and the light source conditions are unchanged, the background image may be a still image. In addition, when the extraction unit 2 obtains the mask image _{Mc, t} (u, v) as a likelihood map related to the foreground instead of a binary map (only foreground / background distinction), any type of existing target (such as an object) ) A likelihood calculation method may be used, for example, as a saliency map.

なお、抽出部2に関しても前述の校正部1と同様に、入力される画像V_c,t(u,v)に予めマスク画像M_c,t(u,v)が紐づけられているのであれば、抽出部2は合成装置10から省略された外部構成としてもよい。 As for the extraction unit 2, as in the calibration unit 1, the mask image M _{c, t} (u, v) is associated with the input image V _{c, t} (u, v) in advance. For example, the extraction unit 2 may have an external configuration omitted from the synthesizing device 10.

＜算出部3＞
算出部3は、抽出部2で生成されたマスク画像M_c,t(u,v)を用い、後段側に位置している機能部（この後の処理を担う機能部）である逆投影部6によって逆投影面群に逆投影するために用いるアルファ値α1_c,t(u,v)を、各視点画像としてのカメラ画像c毎に算出し、当該アルファ値α1_c,t(u,v)を逆投影部6へと出力する。なお、当該算出されるアルファ値は、後段側の描画部7において最終的に合成される自由視点画像SY_t(u,v)におけるオブジェクトの輪郭の残り具合を加減するパラメータとしての役割を果たすこととなる。算出部3では具体的に以下の各実施形態でアルファ値α1_c,t(u,v)を算出することができる。 <Calculation unit 3>
The calculation unit 3 uses the mask image M _{c, t} (u, v) generated by the extraction unit 2, and is a back projection unit that is a functional unit (functional unit responsible for subsequent processing) located on the rear stage side. The alpha value α1 _{c, t} (u, v) used for backprojecting to the backprojection plane group by 6 is calculated for each camera image c as each viewpoint image, and the alpha value α1 _{c, t} (u, v ) Is output to the backprojection unit 6. The calculated alpha value plays a role as a parameter for adjusting the remaining degree of the contour of the object in the free viewpoint image SY _t (u, v) finally synthesized in the drawing unit 7 on the subsequent stage side. It becomes. Specifically, the calculation unit 3 can calculate the alpha value α1 _{c, t} (u, v) in each of the following embodiments.

第一実施形態では、最も簡素な算出手法として、抽出部2から得られたマスク画像M_c,t(u,v)の画素値をそのままアルファ値として採用する。例えば、マスク画像M_c,t(u,v)が２値(0又は1)のものとして抽出されている場合であれば、その画素値をそのまま２値のアルファ値としてもよい。すなわち、前景（画素値＝1）領域のアルファ値が1となり、背景（画素値＝0）領域のアルファ値が0となる。また、マスク画像M_c,t(u,v)が前景の尤度マップとして与えられていれば、当該尤度マップをそのままで、あるいは、値が0以上1以下となるように規格化したものとして、アルファ値を得るようにすればよい。 In the first embodiment, as the simplest calculation method, the pixel value of the mask image _{Mc, t} (u, v) obtained from the extraction unit 2 is directly adopted as the alpha value. For example, if the mask image _{Mc, t} (u, v) is extracted as a binary (0 or 1) image, the pixel value may be used as a binary alpha value as it is. That is, the alpha value of the foreground (pixel value = 1) region is 1, and the alpha value of the background (pixel value = 0) region is 0. Also, if the mask image _{Mc, t} (u, v) is given as a likelihood map of the foreground, the likelihood map is left as it is or normalized so that the value is 0 or more and 1 or less As a result, an alpha value may be obtained.

なお、第一実施形態は、算出部3が合成装置10から省略されている構成（算出部3で得たマスク画像M_c,t(u,v)がアルファ値α1_c,t(u,v)であるものとして直接に逆投影部6へと入力される構成）とみなすこともできる。（尤度マップに関して規格化する場合は当該規格化を抽出部2の段階で実施しておけばよい。） In the first embodiment, the calculation unit 3 is omitted from the synthesizing device 10 (the mask image M _{c, t} (u, v) obtained by the calculation unit 3 has an alpha value α1 _{c, t} (u, v ) Can be regarded as a configuration directly input to the back projection unit 6). (When normalizing the likelihood map, the normalization may be performed at the extraction unit 2 stage.)

第一実施形態は、抽出部2において、前景被写体のマスク画像M_c,t(u,v)が正しく抽出されている場合には有効となる。しかしながら、実際には、現実世界の映像からマスク画像を抽出した際、ノイズの影響により、例えば被写体の一部が欠損しているようなマスク画像（２値の場合）が抽出されることも少なくない。このように、正しいマスク画像の抽出が期待され得ない状況にも好適な別の実施形態として、以下の第二及び第三実施形態がある。 The first embodiment is effective when the extraction unit 2 correctly extracts the foreground subject mask image _{Mc, t} (u, v). However, in practice, when a mask image is extracted from a real-world video, a mask image (in the case of a binary value) in which, for example, a part of the subject is missing is rarely extracted due to the influence of noise. Absent. As described above, there are the following second and third embodiments as other embodiments suitable for situations where extraction of a correct mask image cannot be expected.

なお、第二及び第三実施形態ではマスク画像M_c,t(u,v)の前景と背景とが2値的に区別されることを前提とする。マスク画像M_c,t(u,v)が前景の尤度としてグレースケール階調等の2値よりも多い段階値（又は連続値）で与えられている場合には、当該尤度に対して閾値判定から前景又は背景の区別を得るようにすればよい。 In the second and third embodiments, it is assumed that the foreground and the background of the mask image _{Mc, t} (u, v) are binaryly distinguished. When the mask image M _{c, t} (u, v) is given as a foreground likelihood with a step value (or continuous value) larger than two values such as grayscale gradation, the likelihood is A foreground or background distinction may be obtained from the threshold determination.

第二実施形態では、マスク画像M_c,t(u,v)における前景の画素は第一実施形態と同様の値（アルファ値が１）を採用するが、背景の画素についてはそのアルファ値を、非ゼロであってゼロより大きく１未満の値τ（0<τ<1、例えばτ＝0.5）に設定することにより、アルファ値α1_c,t(u,v)を算出する。 In the second embodiment, the foreground pixels in the mask image M _{c, t} (u, v) adopt the same value (alpha value is 1) as in the first embodiment, but the background pixels have their alpha values changed. The alpha value α1 _{c, t} (u, v) is calculated by setting a value τ that is non-zero and greater than zero and less than 1 (0 <τ <1, for example, τ = 0.5).

第二実施形態によれば、次いで実施される逆投影部6における処理で逆投影を行ってアルファ値α1_c,t(u,v)を重ね合わせた際、背景画素値τがゼロではないので、前景と背景との境界部分が若干残りやすくなる効果が期待され、抽出部2で抽出されたマスク画像M_c,t(u,v)の不正確さによる悪影響を低減する方向へと処理を進めることが可能となる。 According to the second embodiment, when the back projection is performed in the back projection unit 6 to be performed next and the alpha value α1 _{c, t} (u, v) is superimposed, the background pixel value τ is not zero. Therefore, it is expected that the boundary between the foreground and the background is likely to remain slightly, and the processing is performed to reduce the adverse effect due to the inaccuracy of the mask image _{Mc, t} (u, v) extracted by the extraction unit 2. It is possible to proceed.

第三実施形態は、上記の第二実施形態で背景アルファ値τがその位置によらず固定値とされていたのを位置に応じて変化させるものであり、マスク画像M_c,t(u,v)における前景と背景との境界からアルファ値を決定する対象となる画素までの距離が大きくなるに従い、当該背景画素に設定されるアルファ値が減少していくものとして、アルファ値α1_c,t(u,v)を算出する。具体的には、アルファ値を決定する対象の背景画素から、当該画素近傍のマスク境界までの距離（例えば垂線距離）をdとして、アルファ値αを、例えば次式
（１） α＝θ・f(d)
を用いて算出することができる。ここで、f(d)はアルファ値を返すdの単調減少関数であり、θはアルファ値の減衰率である。 The third embodiment changes the background alpha value τ, which is a fixed value regardless of the position in the second embodiment, according to the position, and changes the mask image _{Mc, t} (u, As the distance from the boundary between the foreground and background in v) to the pixel for which the alpha value is determined increases, the alpha value set for the background pixel decreases, and the alpha value α1 _{c, t} Calculate (u, v). Specifically, the distance (for example, perpendicular distance) from the background pixel whose alpha value is to be determined to the mask boundary in the vicinity of the pixel is d, and the alpha value α is expressed by, for example, the following equation (1) α = θ · f (d)
Can be used to calculate. Here, f (d) is a monotonically decreasing function of d that returns an alpha value, and θ is an attenuation rate of the alpha value.

第三実施形態は、第二実施形態と比較すると、より自然な見た目をもって被写体（対象）を自由視点映像化することを可能にするが、一方で、各画素近傍のマスク境界までの距離dを算出する必要があるので、計算量及び計算時間が増大する方向にあるといえる。 Compared with the second embodiment, the third embodiment makes it possible to convert a subject (target) into a free viewpoint image with a more natural appearance, but on the other hand, the distance d to the mask boundary in the vicinity of each pixel is reduced. Since it is necessary to calculate, it can be said that the amount of calculation and the calculation time are increasing.

さらに、以上に述べた各実施形態以外でも、例えば前景の画素のアルファ値を1とし、一方で背景の画素のアルファ値を1未満の値とするような算出方法であれば、種々の方法が採用可能である。 Further, in addition to the embodiments described above, various methods can be used as long as the calculation method is such that, for example, the alpha value of the foreground pixel is 1 and the alpha value of the background pixel is less than 1. It can be adopted.

＜面群設定部4＞
面群設定部4は、多視点画像V_c,t(u,v)の世界座標系xyzをモデル化した３次元モデル空間内に、後段側処理部の逆投影部6等で用いる投影先としての複数の面である逆投影面群Pを設定して、順設定部5へと当該設定結果（すなわち、面群Pを構成する各面が３次元モデル空間内において占める座標位置・範囲）を出力する。詳細を後述する通り、この逆投影面群Pは、入力された多視点画像V_c,t(u,v)を３次元モデル空間へ逆投影する際の基準として用いられるものである。 <Face group setting unit 4>
The plane group setting unit 4 is used as a projection destination to be used by the back projection unit 6 or the like of the rear side processing unit in a three-dimensional model space that models the world coordinate system xyz of the multi-viewpoint image V _{c, t} (u, v). The back projection plane group P that is a plurality of planes is set, and the setting result (that is, the coordinate position / range that each plane constituting the plane group P occupies in the three-dimensional model space) is sent to the order setting unit 5. Output. As will be described in detail later, this backprojection plane group P is used as a reference for backprojecting the input multi-viewpoint image V _{c, t} (u, v) to the three-dimensional model space.

面群設定部4では具体的に、ユーザ入力等によって指定される３次元モデル空間内の仮想視点CVの位置（及び視線の向き）に応じたものとして、逆投影面群Pを設定することができる。図５は、当該設定する一実施形態を説明するための模式例を示す図である。（図５はまた、後述する順設定部5の一実施形態の模式例ともなっている。）一実施形態では、ユーザ入力等により指定される３次元モデル空間内での仮想視点CVの位置及びその視線方向Lc（すなわち、仮想視点CVのカメラ軸Lc）に対し、当該カメラ軸Lcの直線を貫くそれぞれ所定サイズのK個の平面からなる平面群P={P_t,k|k=1,2,…,K}であって、各平面P_t,kがカメラ軸Lcに対して所定角度をなして互いに平行なものとして設定することができる。 Specifically, the surface group setting unit 4 can set the backprojection surface group P as the one corresponding to the position (and the direction of the line of sight) of the virtual viewpoint CV in the three-dimensional model space specified by the user input or the like. it can. FIG. 5 is a diagram showing a schematic example for explaining an embodiment to be set. (FIG. 5 is also a schematic example of one embodiment of the order setting unit 5 described later.) In one embodiment, the position of the virtual viewpoint CV in the three-dimensional model space designated by the user input and the like and the position thereof A plane group P = {P _{t, k} | k = 1,2 consisting of K planes each having a predetermined size passing through a straight line of the camera axis Lc with respect to the viewing direction Lc (that is, the camera axis Lc of the virtual viewpoint CV) ,..., K}, and the planes P _{t, k} can be set to be parallel to each other at a predetermined angle with respect to the camera axis Lc.

ここで、設定する各平面P_t,kの３次元モデル空間内での向きに関して、次のような各実施形態が可能である。すなわち、各平面P_t,kがカメラ軸Lcとなす角は任意の所定角でよいが、一実施形態では当該角度を直角としてよい。また、カメラ軸Lcを基準として各平面P_t,kの向きを設定するのではなく、３次元モデル空間内に設定されているxyz座標（世界座標系xyzと同一）を基準とした向きを設定するようにしてもよい。例えばxy平面が地面（フィールド）であるものとして、xy平面に平行な平面群Pを設定してもよいし、あるいは、yz平面又はzx平面に平行な平面群Pを設定してもよい。 Here, with respect to the orientation of each plane P _{t, k to be} set in the three-dimensional model space, the following embodiments are possible. That is, the angle formed by each plane P _{t, k and the} camera axis Lc may be an arbitrary predetermined angle, but in one embodiment, the angle may be a right angle. Also, instead of setting the orientation of each plane P _{t, k based on} the camera axis Lc, set the orientation based on the xyz coordinates (same as the world coordinate system xyz) set in the 3D model space. You may make it do. For example, assuming that the xy plane is the ground (field), a plane group P parallel to the xy plane may be set, or a plane group P parallel to the yz plane or the zx plane may be set.

なお、校正部1に関して図３等を参照して説明した世界座標系xyz（すなわち、多視点画像V_c,t(u,v)が撮影された空間としての世界座標系xyz）を、合成画像SY_t(u,v)を得るレンダリングのためにモデル化したものが３次元モデル空間である。ここで、世界座標系xyzの各点の位置と３次元モデル空間の各点の位置は一対一に対応しているので、３次元モデル空間に関してもその座標系を世界座標系と同じ「xyz」であるものとして、以下でも説明を行うものとする。 Note that the world coordinate system xyz described with reference to FIG. 3 and the like with respect to the calibration unit 1 (that is, the world coordinate system xyz as a space in which the multi-viewpoint image V _{c, t} (u, v) is captured) is combined with the composite image. What is modeled for rendering to obtain SY _t (u, v) is a three-dimensional model space. Here, since the position of each point in the world coordinate system xyz and the position of each point in the three-dimensional model space correspond one-to-one, the coordinate system for the three-dimensional model space is the same as the world coordinate system “xyz”. In the following, the description will be given.

また、一実施形態では、互いに平行な平面P_t,k同士は隣接するもの同士が所定距離dをもって離れて位置しているものとして平面群Pを設定すればよい。別の一実施形態では、平面P_t,kのうち隣接するものの間の距離は一定値dではなく、変化するものであってもよい。例えば、３次元モデル空間において対象の存在する可能性の高い位置の近くに存在する平面P_t,kほど当該距離を小さくする等の設定を用いてもよい。 In one embodiment, the plane group P may be set assuming that the planes P _{t, k} parallel to each other are adjacent to each other with a predetermined distance d. In another embodiment, the distance between adjacent ones of the planes P _{t, k} is not a constant value d and may vary. For example, a setting may be used in which the distance is reduced as the plane P _{t, k} exists near a position where the object is likely to exist in the three-dimensional model space.

なお、本発明において面群設定部4で設定する平面群Pは、従来技術の視体積交差法における３次元ボクセル（点群）が果たす役割を、深さ情報を不要として２次元領域（としての当該平面群P）において実現することで、メモリ消費を抑制して高速に自由視点の合成映像SY_t(u,v)の合成を可能とするものである。従って、３次元モデル空間xyz内において平面群Pが配置される範囲は、多視点画像V_c,t(u,v)から抽出部2でその被写体領域がマスクM_c,t(u,v)として抽出される対象（最終的に描画部7にて自由視点でのレンダリングがなされる対象）が存在しうる範囲をカバーするものであればよい。当該存在しうる範囲の情報は多視点画像V_c,t(u,v)に紐づいた情報として予め与えておき、面群設定部4では当該範囲をカバーするように、平面群Pを設定すればよい。（当該紐づいた情報はさらに、校正部1の校正情報に関連付けられて予め与えられていてもよい。）すなわち、平面群Pに関して、構成される平面の個数K、面間の距離d、各面の大きさなどのメモリ消費に関連する設定を、当該範囲をカバーするように設定すればよい。 In the present invention, the plane group P set by the plane group setting unit 4 plays the role of the three-dimensional voxel (point group) in the visual volume intersection method of the prior art without the need for depth information. By realizing in the plane group P), it is possible to synthesize a free viewpoint synthesized video SY _t (u, v) at high speed while suppressing memory consumption. Accordingly, the scope of the plane group P are arranged in the three-dimensional model space xyz is multiview image _{V c, t (u, v} ) the subject region mask M _c by the extraction unit 2 _{from, t} (u, v) As long as it covers the range in which the object to be extracted (the object that is finally rendered from the free viewpoint in the rendering unit 7) can exist. The information on the range that can exist is given in advance as information linked to the multi-viewpoint image V _{c, t} (u, v), and the plane group setting unit 4 sets the plane group P so as to cover the range. do it. (The associated information may be further given in advance in association with the calibration information of the calibration unit 1.) That is, regarding the plane group P, the number K of planes to be configured, the distance d between planes, A setting related to memory consumption such as the size of the surface may be set so as to cover the range.

なお、仮想視点CVの視点位置（及び向き）をユーザ等が指定する場合は、任意の既存の情報入力インタフェースを利用してよい。例えば、数値として直接入力してもよいし、当該数値を既存の視線位置検出技術（瞳撮影用のカメラに対するユーザの瞳の位置の検出）から求めるようにしてもよい。当該数値をマウス操作やタッチパネル上の操作から算出して取得してもよい。複数の視点位置の候補からメニュー選択で入力するようにしてもよい。 Note that when the user or the like designates the viewpoint position (and orientation) of the virtual viewpoint CV, any existing information input interface may be used. For example, it may be directly input as a numerical value, or the numerical value may be obtained from an existing line-of-sight position detection technique (detection of the position of the user's pupil with respect to the pupil photographing camera). The numerical value may be calculated and acquired from a mouse operation or an operation on a touch panel. A menu selection may be used to input from a plurality of viewpoint position candidates.

＜順設定部5＞
順設定部5は、面群設定部4で以上のように設定した面群Pを構成する各平面P_t,kに対して順番を設定して、当該順番設定された面群P={P_t,k|k=1,2,…,K}を描画部7へと出力する。後述する描画部7では当該出力された順番に従って面群Pを用いた処理を行う。 <Order setting part 5>
The order setting unit 5 sets the order for each plane P _{t, k} constituting the surface group P set as described above by the surface group setting unit 4, and the set surface group P = {P _{t, k} | k = 1, 2,..., K} is output to the drawing unit 7. The drawing unit 7 described later performs processing using the surface group P in accordance with the output order.

順設定部5では具体的に、面群設定部4が設定した面群P={P_t,k|k=1,2,…,K}に関して、面群設定部4においてユーザ入力等によって指定された仮想視点CVの位置との関係に基づいた順番を設定することができる。好適な一実施形態として、仮想視点CVから各面が近い順で順番を付与してもよい。なお、以下では面群P={P_t,k|k=1,2,…,K}等の表記を用いた場合、各面を区別するインデクスkは、順設定部5で当該付与された順番を表しているものとする。また、説明例として、当該近い順番kを設定した場合を説明する。すなわち、面群Pにおける平面P_t,kとは、仮想視点CVからの距離の近さがk番目である平面を表すものとし、kが小さいほど仮想視点CVに対して手前に位置しており、kが大きいほど仮想視点CVから見て奥側に位置しているものとする。 Specifically, in the order setting unit 5, the surface group P = {P _{t, k} | k = 1,2,..., K} set by the surface group setting unit 4 is designated by the user input or the like in the surface group setting unit 4 The order based on the relationship with the position of the virtual viewpoint CV can be set. As a preferred embodiment, the order may be given in the order in which each surface is closer to the virtual viewpoint CV. In the following, when the notation of the surface group P = {P _{t, k} | k = 1,2,..., K} is used, the index k for distinguishing each surface is given by the order setting unit 5 It shall represent the order. Moreover, the case where the said close order k is set is demonstrated as an explanatory example. That is, the plane P _{t, k} in the surface group P represents a plane whose distance from the virtual viewpoint CV is kth, and the smaller k is, the closer to the virtual viewpoint CV is. , It is assumed that the larger k is, the farther the position is from the virtual viewpoint CV.

例えば、図５の模式例は指定された仮想視点CVに対してカメラ軸方向Lcに垂直な面群P（模式例として、K=3個の平面で構成される面群P）を面群設定部4において設定し、さらに、順設定部5において仮想視点CVから位置が近い順にk=1,2,3と順番を設定することで、面群Pとして当該近い順の3個の平面P_t,1,P_t,2,P_t,3が３次元モデル空間xyz内に、互いに距離dだけ離れて設定される例となっている。また、当該設定された3個の平面P_t,1,P_t,2,P_t,3は対象OBの存在しうる範囲をカバーしているものとなっている。 For example, the schematic example in FIG. 5 sets a plane group P (a group of planes P composed of K = 3 planes) perpendicular to the camera axis direction Lc with respect to the specified virtual viewpoint CV. Set in the unit 4, and further, in the order setting unit 5 by setting the order k = 1, 2, 3 in order from the closest to the virtual viewpoint CV, the three planes P _{t in the} closest order as the plane group P _{, 1} , P _{t, 2} , P _{t, 3} are set in the three-dimensional model space xyz so as to be separated from each other by a distance d. Further, the set three planes P _{t, 1} , P _{t, 2} , P _{t, 3} cover the range where the target OB can exist.

＜レンダリング部20＞
レンダリング部20は、校正部1で得たキャリブレーションデータと、算出部3で算出されたアルファ値α1_c,t(u,v)と、順設定部5で設定された順序付与された面群P={P_t,k|k=1,2,…,K}と、を用いることにより、合成装置10への入力としての多視点画像V_c,t(u,v)における対象を自由視点化してディスプレイ領域にレンダリングした結果として、合成画像SY_t(u,v)を得る。 <Rendering unit 20>
The rendering unit 20 includes the calibration data obtained by the calibration unit 1, the alpha value α1 _{c, t} (u, v) calculated by the calculation unit 3, and the order-assigned surface group set by the order setting unit 5. By using P = {P _{t, k} | k = 1,2, ..., K}, the object in the multi-view image V _{c, t} (u, v) as an input to the synthesis device 10 is a free viewpoint. As a result of rendering into the display area, a composite image SY _t (u, v) is obtained.

より具体的に、レンダリング部20はハードウェアとしては例えばGPUを用いて実現することが可能であり、順序付与して設定された面群P={P_t,k|k=1,2,…,K}をGPUの頂点シェーダへ当該順序kにて入力し、面群設定部4においてユーザ等から指定された仮想視点CVの情報（視点位置座標及び視線向き情報）に基づいて、ディスプレイ上に当該仮想視点CVにおけるものとしての対象の自由視点画像をレンダリングし、合成画像SY_t(u,v)を得る。（GPUのピクセルシェーダによってピクセル単位でレンダリングするようにしてもよい。）ここでこの際、対象以外の背景の情報である3DCGデータも読み込み、公知の方法に基づき同時に並行してレンダリングを行うことによって、最終的な自由視点画像を合成することができる。なお、背景の情報としての3DCGデータは、抽出部2で背景差分法を適用する際に用意しておく所定背景を仮想視点CVで見たものへと変換（平面部分ごとの平面射影変換等）することで合成されるものであってもよい。 More specifically, the rendering unit 20 can be realized by using, for example, a GPU as hardware, and a plane group P = {P _{t, k} | k = 1, 2,. , K} to the GPU vertex shader in the order k, and on the display based on the virtual viewpoint CV information (viewpoint position coordinates and line-of-sight information) specified by the user etc. in the face group setting unit 4 The target free viewpoint image as that at the virtual viewpoint CV is rendered to obtain a composite image SY _t (u, v). (It may be rendered in pixel units by the GPU pixel shader.) Here, 3DCG data, which is background information other than the target, is also read and simultaneously rendered in parallel based on a known method. The final free viewpoint image can be synthesized. In addition, 3DCG data as background information is converted to a predetermined background prepared by the extraction unit 2 when applying the background subtraction method to a virtual viewpoint CV (planar projection conversion for each plane part, etc.) May be synthesized.

以下、レンダリング部20における当該レンダリング処理を実現するための要素処理を担う逆投影部6、描画部7、再投影部8及び付与部9に関して説明する。ここで、各部6,7,8,9の個別処理を説明した後に、描画部7以降の各部7,8,9についてはさらに、互いに連携しての繰り返し処理・更新処理を伴うものであるため、後述の図７を参照してその動作フローを説明する。 Hereinafter, the back projection unit 6, the drawing unit 7, the reprojection unit 8, and the assigning unit 9 that perform element processing for realizing the rendering process in the rendering unit 20 will be described. Here, after explaining the individual processes of the respective units 6, 7, 8, 9 and 9, the respective units 7, 8, 9 after the drawing unit 7 are further accompanied by repetitive processing / update processing in cooperation with each other. The operation flow will be described with reference to FIG.

＜逆投影部6＞
図６は逆投影部6の処理を、カメラが3個(c=1,2,3)及び面群Pが3個の面(k=1,2,3)で構成される場合に関して、模式的に示すものである。逆投影部6は、算出部3から得た各カメラc（c=1,2,…,N）に関するアルファ値α1_c,t(u,v)を、順設定部5で得た面群P={P_t,k|k=1,2,…,K}の各々の面P_t,k上に逆投影したうえで積算することにより、各面P_t,k上における積算されたアルファ値α2_t,k(i,j)を得る。 <Back projection unit 6>
FIG. 6 schematically illustrates the processing of the backprojection unit 6 with respect to the case where the camera is composed of three (c = 1, 2, 3) and the surface group P is composed of three surfaces (k = 1, 2, 3). It is shown as an example. The back projection unit 6 obtains the alpha value α1 _{c, t} (u, v) for each camera c (c = 1, 2,..., N) obtained from the calculation unit 3 by the surface group P obtained by the order setting unit 5. = {P _{t, k} | k = 1,2, ..., K} Back-projected onto each surface P _{t, k} and then integrated, so that the accumulated alpha value on each surface P _{t, k} Obtain α2 _{t, k} (i, j).

ここで、各データ内容の区別の明確化のために、次のように区別した表記を用いている。すなわち、算出部3で得られるアルファ値を「α1」、逆投影部6でこれらを各カメラcについて積算して得られるアルファ値を「α2」として名前（関数表記の名前）を区別している。また、算出部3のアルファ値α1は入力画像V_c,t(u,v)の位置(u,v)に対応するので画素位置(u,v)として表記し、一方、逆投影部6のアルファ値α2はxyz空間内に配置された各面P_t,k上における分布として与えられるものであるため、(u,v)とは区別してその平面上の位置を(i,j)としている。なお、当該位置(i,j)は画素位置(u,v)とは異なり、一般に実数で指定されるものとなる。 Here, in order to clarify the distinction between the contents of each data, the following distinctions are used. That is, the alpha value obtained by the calculation unit 3 is “α1”, the alpha value obtained by integrating these with respect to each camera c by the back projection unit 6 is “α2”, and the names (function names) are distinguished. Further, the alpha value α1 of the calculation unit 3 corresponds to the position (u, v) of the input image V _{c, t} (u, v), and therefore is described as the pixel position (u, v), while the back projection unit 6 Since the alpha value α2 is given as a distribution on each plane P _{t, k} arranged in the xyz space, the position on the plane is set to (i, j) in distinction from (u, v) . Note that the position (i, j) is generally designated by a real number, unlike the pixel position (u, v).

図６にも模式的に示されているが、逆投影部6ではインデクスkで指定される各逆投影面P_t,kに対して、以下の手順１〜３で当該面P_t,k上における積算されたアルファ値α2_t,k(i,j)を得ることができる。なお、（後述する描画部7とは異なり、）逆投影部8では各逆投影面P_t,kに関して、（インデクスkで指定される順番に限らず、）任意の順番で以下の手順１〜３を実施してよい。また、複数の平面P_t,kに関して並行で実施してもよい。 Although shown schematically in Figure 6, the back projection plane P _t specified by the inverse projection unit 6 index _k, with respect to _k, the face P _t by the following steps 1 to _3, the _k The accumulated alpha value α2 _{t, k} (i, j) at can be obtained. Note that (in contrast to the drawing unit 7 described later), the backprojection unit 8 relates to each backprojection plane P _{t, k} in any order (not limited to the order specified by the index k). 3 may be implemented. Moreover, you may implement in parallel regarding several plane _{Pt, k} .

（手順１）アルファ値画像α1_c,t(u,v)を、当該画像を得たカメラcの３次元モデル空間xyz内での配置位置に対応する位置に配置する。 (Procedure 1) The alpha value image α1 _{c, t} (u, v) is arranged at a position corresponding to the arrangement position in the three-dimensional model space xyz of the camera c that has obtained the image.

（手順２）３次元モデル空間xyz内に上記配置したアルファ値画像α1_c,t(u,v)を、対応するカメラcのカメラ中心から、面P_t,kへ向けて逆投影することで、アルファ値画像α1_c,t(u,v)の各画素位置(u,v)の面P_t,k上における逆投影位置(i_[u,v],j_[u,v])_cを得る。ここで、空間xyz内にて当該逆投影される範囲はカメラcのカメラ中心を頂点としアルファ値画像α1_c,t(u,v)を底面（当該錐体の切断面、2値で底面領域が定義される場合を想定）とする錐体CN_c,tで表現される。図６ではc=1,2,3に関して当該錐体CN_c,tが破線によって模式的に示されている。面P_t,kへ向けての投影結果は積集合「P_t,k∩CN_c,t」である。 (Procedure 2) By projecting the alpha value image α1 _{c, t} (u, v) arranged in the three-dimensional model space xyz from the camera center of the corresponding camera c toward the plane P _{t, k} , Back projection position (i _{[u, v]} , j _{[u, v]} ) _c on the plane P _{t, k} of each pixel position (u, v) of the alpha value image α1 _{c, t} (u, v) obtain. Here, the back-projected range in the space xyz is the top of the camera c of the camera c, and the alpha value image α1 _{c, t} (u, v) is the bottom surface (the cut surface of the cone, the bottom surface area in binary) It is expressed by a cone CN _{c, t} . In FIG. 6, the cone CN _{c, t} is schematically shown by a broken line with respect to c = 1,2,3. The projection result toward the plane P _{t, k} is the product set “P _{t, k} ∩CN _{c, t} ”.

（手順３）上記得た逆投影位置(i_[u,v],j_[u,v])_c（当該位置は、面P_t,kの配置によってxyz空間内の位置でもある）上において対応する各カメラc（c=1, 2,…, Nの全てのうち、当該逆投影が可能なもの）のアルファ値α1_c,t(u,v)（すなわち、投影元のアルファ値α1_c,t(u,v)）を積算することにより、アルファ値α2_t,k(i,j)を得る。こうして例えば、アルファ値α1_c,t(u,v)が2値マスクである場合、すべてのカメラcにおいて1となっている箇所のα2_t,k(i,j)は1（対象）となり、１つでも0が含まれれば0（対象の領域ではない）となる。また、算出部3にて0〜1の値を連続的にα1_c,t(u,v)に設定した場合は、境界が徐々に0に近づく効果が得られる。 (Procedure 3) Corresponding on the obtained back projection position (i _{[u, v]} , j _{[u, v]} ) _c (the position is also a position in the xyz space depending on the arrangement of the plane P _{t, k} ) Alpha value α1 _{c, t} (u, v) of each camera c (c = 1, 2,..., N that can be backprojected) (that is, alpha value α1 _{c, The} alpha value α2 _{t, k} (i, j) is obtained by accumulating _t (u, v)). Thus, for example, when the alpha value α1 _{c, t} (u, v) is a binary mask, α2 _{t, k} (i, j) at a location of 1 in all the cameras c is 1 (target), If even one contains 0, it becomes 0 (not the target area). Further, when the value of 0 to 1 is continuously set to α1 _{c, t} (u, v) in the calculation unit 3, an effect that the boundary gradually approaches 0 is obtained.

＜描画部7＞
描画部7では、面群設定部4においてユーザ入力等で指定された仮想視点CVから見た自由視点画像として、合成画像SY_t(u,v)を合成する。ここで、多視点画像V_c,t(u,v)に撮影されている対象を当該仮想視点CVから見た状態として前景テクスチャTX_c,t(u,v)としてレンダリングすると共に、既に説明した通りの公知の手法で当該仮想視点CVから見た状態での背景BG_t(u,v)を合成することにより、合成画像SY_t(u,v)を得る。 <Drawing part 7>
The drawing unit 7 synthesizes the synthesized image SY _t (u, v) as a free viewpoint image viewed from the virtual viewpoint CV designated by the user input or the like in the surface group setting unit 4. Here, the object captured in the multi-viewpoint image V _c _{, t} (u, v) is rendered as the foreground texture TX _{c, t} (u, v) as viewed from the virtual viewpoint CV, and has already been described. A synthesized image SY _t (u, v) is obtained by synthesizing the background BG _t (u, v) as viewed from the virtual viewpoint CV by a known method as described above.

ここで、描画部7では具体的に前景テクスチャTX_c,t(u,v)を描画するに際して、順設定部5で得た面群P={P_t,k|k=1,2,…,K}の順番kに従って逆投影面P_t,kの順で描画を行う。当該描画の際に用いる多視点画像V_c,t(u,v)は、インデクスc=1,2,…,Nで区別される全カメラcのうち、指定された仮想視点CVに位置（及び向き）が近いと判定されたn個（n≦N）のものを用いる。なお、当該n個を用いることは、例えば仮想視点CVから逆向きのカメラcの画像V_c,t(u,v)は、対象を仮想視点CVとは反対側から見ているものに相当するため、描画に必要なテクスチャが含まれていない可能性が高いためである。 Here, when the drawing unit 7 specifically draws the foreground texture TX _{c, t} (u, v), the surface group P = {P _{t, k} | k = 1,2,. , K} in the order of the back projection plane P _{t, k} according to the order k. The multi-viewpoint image V _{c, t} (u, v) used in the drawing is located at the specified virtual viewpoint CV (and among the all cameras c distinguished by indexes c = 1, 2,..., N) (and (N) (n ≦ N) determined to be close. Note that the use of the n pieces corresponds to, for example, the image V _{c, t} (u, v) of the camera c that is reversed from the virtual viewpoint CV when the target is viewed from the side opposite to the virtual viewpoint CV. Therefore, there is a high possibility that the texture necessary for drawing is not included.

例えば、各カメラcが対象を円周状に又は球面状に取り囲んで配置され対象の方を向いて撮影している場合であって、仮想視点CVも当該円周又は球面の近傍から対象の方を見るものとして設定される場合であれば、位置の近さと向きの近さとは連動するので、位置又は向きのいずれかの近いn個を選択すればよい。各カメラcの配置が任意の場合には、位置及び向きの両方を考慮して、仮想視点CVに近いと判定されるn個のカメラcを選択すればよい。 For example, in the case where each camera c is arranged so as to surround the object in a circumferential shape or a spherical shape and is photographed facing the object, the virtual viewpoint CV is also taken from the vicinity of the circumference or the sphere. If it is set to watch the image, the proximity of the position and the proximity of the direction are linked, so it is only necessary to select n pieces of either the position or the direction that are close. If the arrangement of each camera c is arbitrary, n cameras c determined to be close to the virtual viewpoint CV may be selected in consideration of both the position and the orientation.

以下、説明のため、当該位置（及び向き）が近いと判定されたn個のカメラのインデクスがc=1,2,3,…,nであるものとする。 Hereinafter, for the sake of explanation, it is assumed that the indices of n cameras determined to be close to the position (and direction) are c = 1, 2, 3,.

＜再投影部8＞
また、描画部7における逆投影面P_t,kに関する描画において、当該n個のカメラにおける多視点画像V_c,t(u,v)の中からそれぞれ、実際に当該描画に用いるための領域S_c,t,k(u,v)の設定を行うのが再投影部8である。再投影部8は領域S_c,t,k(u,v)を求めて描画部7及び付与部9へと出力する。 <Reprojection unit 8>
Further, in the drawing related to the back projection plane P _{t, k} in the drawing unit 7, each of the regions S to be actually used for the drawing from the multi-viewpoint images V _{c, t} (u, v) in the n cameras. _The reprojection unit 8 sets _{c, t, k} (u, v). The reprojection unit 8 obtains the region _{Sc, t, k} (u, v) and outputs it to the drawing unit 7 and the assigning unit 9.

＜付与部9＞
また、描画部7が当該描画するために用いる領域S_c,t,k(u,v)（画像V_c,t(u,v)の一部に相当）の各画素位置(u,v)に関しては、順番kに従って逆投影面P_t,kの順で描画を行っていくに際して、既にテクスチャTX_c,t(u,v)の描画に用いられたか否か等の描画履歴を反映した情報が制御値d_c,t,k-1(u,v)として保持・更新されており、描画部7では当該制御値d_c,t,k-1(u,v)を考慮して描画を行う。付与部9は、当該描画の際に考慮される制御値d_c,t,k-1(u,v)を求め、描画部7へと出力するものである。 <Granting part 9>
In addition, each pixel position (u, v) in the region S _{c, t, k} (u, v) (corresponding to a part of the image V _{c, t} (u, v)) used by the drawing unit 7 for drawing. For drawing, information that reflects the drawing history such as whether or not the texture TX _{c, t} (u, v) has already been used when drawing in the order of the back projection plane P _t, k according to the order k. Is held and updated as the control value d _{c, t, k-1} (u, v), and the drawing unit 7 draws in consideration of the control value d _{c, t, k-1} (u, v). Do. The assigning unit 9 obtains a control value d _{c, t, k-1} (u, v) that is considered in the drawing and outputs the control value d _{c, t, k-1} (u, v) to the drawing unit 7.

付与部9では、各画素位置(u,v)に関しては、順番kに従って逆投影面P_t,kの順で描画を行っていくに際しての描画履歴を反映したものとして、制御値d_c,t,k-1(u,v)を求める。例えば、当該描画がなされた回数として制御値d_c,t,k-1(u,v)として求めてよい。以下では制御値d_c,t,k-1(u,v)は当該描画がなされた回数であるものとして説明する。 In the assigning unit 9, for each pixel position (u, v), the control value d _{c, t} is assumed to reflect the drawing history when drawing in the order of the back projection plane P _{t, k} according to the order k. _{, k-1} (u, v). For example, the control value d _{c, t, k−1} (u, v) may be obtained as the number of times the drawing has been performed. In the following description, it is assumed that the control value d _{c, t, k-1} (u, v) is the number of times the drawing has been performed.

以上、描画部7、再投影部8及び付与部9の個別処理の概略を説明した。図７は、描画部7、再投影部8及び付与部9が互いに連携してテクスチャTX_c,t(u,v)を描画する動作の一実施形態に係るフローチャートである。以下、図７の各ステップを説明しながら、各部7,8,9の動作の詳細を説明する。ここで、図７のフロー構造が繰り返し処理の構成を取ることが見て取れるが、当該繰り返し処理は順番k=1,2,…,Kに従って逆投影面P_t,kの順で描画することを表現したものである。従って、図７の説明においてはインデクスkを逆投影面P_t,kの識別子kとしての意味のほか、当該繰り返し処理の回数カウンタkの意味としても用いることとする。 The outline of the individual processing of the drawing unit 7, the reprojection unit 8, and the grant unit 9 has been described above. FIG. 7 is a flowchart according to an embodiment of an operation in which the drawing unit 7, the reprojection unit 8, and the assigning unit 9 draw the texture TX _{c, t} (u, v) in cooperation with each other. Hereinafter, the details of the operations of the respective units 7, 8, and 9 will be described while explaining the steps of FIG. Here, it can be seen that the flow structure of FIG. 7 takes the configuration of an iterative process, which expresses that the iterative process is drawn in the order of the backprojection planes P _{t, k} according to the order k = 1, 2,. It is a thing. Accordingly, in the description of FIG. 7, the index k is used not only as the meaning of the identifier k of the backprojection plane P _{t, k} but also as the meaning of the number counter k of the repetition process.

図７のフローを開始する際にはカウンタkを初期値k=1へと設定したうえで、ステップS1へと進む。 When the flow of FIG. 7 is started, the counter k is set to an initial value k = 1, and the process proceeds to step S1.

＜ステップS1＞
ステップS1では、付与部9が制御値d_c,t,k-1(u,v)の初期値「k=1」における初期値d_c,t,0(u,v)を、描画に用いる対象となっている各カメラc=1,2,…,nの画像V_c,t(u,v)の各画素位置(u,v)に対応するものとして設定してから、ステップS2へと進む。（なお、「初期値d_c,t,0(u,v)」との記載に関して、「k=0」の逆投影面P_t,kは存在しないが、以下に説明するように「制御値d_c,t,k-1(u,v)」（k≧1）は逆投影面P_t,k（k≧1）に対する描画の際に利用する制御値であるので、初回k=1の逆投影面P_t,1で利用する制御値として「初期値d_c,t,0(u,v)」が存在する。） <Step S1>
In step S1, the assigning unit 9 uses the initial value d _{c, t, 0} (u, v) at the initial value “k = 1” of the control value d _{c, t, k-1} (u, v) for drawing. Set as corresponding to each pixel position (u, v) of the image V _{c, t} (u, v) of each target camera c = 1,2, ..., n, then go to step S2 move on. (Note that there is no backprojection plane P _{t, k with} “k = 0” regarding the description of “initial value d _{c, t, 0} (u, v)”, but “control value as described below. d _{c, t, k-1} (u, v) ”(k ≧ 1) is a control value used for drawing on the backprojection plane P _{t, k} (k ≧ 1). ("Initial value d _{c, t, 0} (u, v)" exists as a control value used on the backprojection plane P _{t, 1.} )

ここで、ステップS1の時点ではまだ描画がなされていないので、付与部9では初期値d_c,t,0(u,v)の値を全てのカメラc=1,2,…,n及び画素位置(u,v)に関して0（描画回数ゼロ）として付与すればよい。 Here, since drawing has not yet been performed at the time of step S1, the assigning unit 9 sets the values of the initial values d _{c, t, 0} (u, v) to all the cameras c = 1, 2,. The position (u, v) may be given as 0 (the number of drawing times is zero).

＜ステップS2＞
ステップS2では、再投影部8が、描画対象の各カメラc=1,2,…,nにおける画像V_c,t(u,v)内の部分領域として、再投影領域S_c,t,k(u,v)を設定してからステップS3へと進む。 <Step S2>
In step S2, the reprojection unit 8 performs the reprojection area S _{c, t, k} as a partial area in the image V _{c, t} (u, v) in each camera c = 1, 2 _,. After (u, v) is set, the process proceeds to step S3.

図８は、再投影部8が再投影領域S_c,t,k(u,v)を得る処理の模式例を示す図である。図８にて、[1]から[2]へと矢印線L1で示される処理は既に図６等も参照して説明した逆投影部6による逆投影処理を示すものであり、逆投影平面P_t,k上に逆投影されたアルファ値α2_t,k(i,j)が楕円状の領域として示されている。図８ではこれに次いで[2]から[3]へと矢印線L2で示されるのが再投影部8による処理の模式例となっている。 FIG. 8 is a diagram illustrating a schematic example of processing in which the reprojection unit 8 obtains the reprojection region _{Sc, t, k} (u, v). In FIG. 8, the processing indicated by the arrow line L1 from [1] to [2] indicates the back projection processing by the back projection unit 6 which has already been described with reference to FIG. _The alpha value α2 _{t, k} (i, j) back-projected on _{t, k} is shown as an elliptical region. In FIG. 8, the process indicated by the arrow L2 from [2] to [3] is a schematic example of processing by the reprojection unit 8.

ここで、再投影部8の処理及びその意義を説明するに際して、用語を次のように定義する。図６の模式例等においても既に説明の通り、逆投影部6で得る逆投影されたアルファ値α2_t,k(i,j)を全ての逆投影面P_t,k上に分布した状態として空間xyz内に並べたもの{α2_t,k(i,j)|k=1,2,…,K}は、従来技術の視体積交差法におけるビジュアル・ハルに相当するものを本発明独自のものとして得たものである。そこで、当該ビジュアル・ハルに相当する並べたデータ{α2_t,k(i,j)|k=1,2,…,K}を「逆投影データ」と呼ぶこととする。 Here, in describing the processing of the reprojection unit 8 and its significance, the terms are defined as follows. As already described in the schematic example of FIG. 6 and the like, the backprojected alpha value α2 _{t, k} (i, j) obtained by the backprojection unit 6 is distributed on all backprojection planes P _{t, k.} {Α2 _{t, k} (i, j) | k = 1,2,..., K} arranged in the space xyz corresponds to the visual hull in the prior art visual volume intersection method. It was obtained as a thing. Therefore, the arranged data {α2 _{t, k} (i, j) | k = 1,2,..., K} corresponding to the visual hull is referred to as “backprojection data”.

逆投影データ{α2_t,k(i,j)|k=1,2,…,K}内からあるkに関して取り出した１つの平面P_t,kのアルファ値α2_t,k(i,j)は、当該逆投影データでその形状が表現される対象（人物等）を当該平面P_t,kでスライスした「断面」に相当するものである。（なお、図８の[2]では、楕円形状として当該断面を模式的に示している。） Alpha value α2 _{t, k} (i, j) of one plane P _{t, k} taken out from k in the back projection data {α2 _{t, k} (i, j) | k = 1,2, ..., K} Corresponds to a “cross section” _obtained by slicing an object (such as a person) whose shape is represented by the backprojection data along the plane P _{t, k} . (Note that [2] in FIG. 8 schematically illustrates the cross section as an elliptical shape.)

再投影部8では、当該インデクスk（処理順番k）で指定される平面P_t,kにおける当該断面としてのアルファ値α2_t,k(i,j)を、各カメラc=1,2,…,nにおける画像平面(u,v)_cに再投影することで、対応する画像V_c,t(u,v)内における再投影領域S_c,t,k(u,v)を取得する。ここで、アルファ値α2_t,k(i,j)のうち値が0より大きく前景と判定される領域を再投影することで再投影領域S_c,t,k(u,v)を取得するようにすればよい。なお、明らかなように、再投影部8における再投影とは逆投影部6による逆投影(u,v)→(x,y,z)の逆、すなわち通常の投影(x,y,z)→(u,v)であり、アルファ値α2_t,k(i,j)の断面領域をカメラcで撮影した場合に画像平面(u,v)_c上に形成される領域を求める処理である。 In the reprojection unit 8, the alpha value α2 _{t, k} (i, j) as the cross section in the plane P _{t, k} designated by the index k (processing order k) is assigned to each camera c = 1, 2,. , n is re-projected onto the image plane (u, v) _c to obtain a re-projection region S _{c, t, k} (u, v) in the corresponding image V _{c, t} (u, v). Here, the reprojection region S _{c, t, k} (u, v) is obtained by reprojecting the region of which alpha value α2 _{t, k} (i, j) has a value greater than 0 and is determined to be the foreground. What should I do? As is apparent, the reprojection in the reprojection unit 8 is the reverse of the backprojection (u, v) → (x, y, z) by the backprojection unit 6, that is, normal projection (x, y, z) → (u, v) is a process for obtaining an area formed on the image plane (u, v) _c when a cross-sectional area of the alpha value α2 _{t, k} (i, j) is photographed by the camera c. .

上記のように取得される各カメラ画像V_c,t(u,v)内の再投影領域S_c,t,k(u,v)は、再投影の処理内容から明らかなように、逆投影面平面P_t,k上の対象の断面を描画するためのテクスチャを含んだものである。そこで、次のステップS3で描画部7が当該断面に対する描画を行うこととなる。 The reprojection region S _{c, t, k} (u, v) in each camera image V _{c, t} (u, v) acquired as described above is backprojected, as is clear from the processing contents of the reprojection. It includes a texture for drawing a cross section of an object on the plane plane P _{t, k} . Therefore, in the next step S3, the drawing unit 7 performs drawing on the cross section.

＜ステップS3＞
ステップS3では、描画部7が、各カメラ画像V_c,t(u,v)（c=1,2,…,n）内の再投影領域S_c,t,k(u,v)のテクスチャを、当該インデクスkで指定される逆投影面P_t,kへと逆投影（(u,v)_c→(x,y,z)の投影）することにより、逆投影面P_t,k上にテクスチャを描画してから、ステップS4へと進む。（当該描画される範囲はアルファ値α2_t,k(i,j)の断面領域となる。） <Step S3>
In step S3, the drawing unit 7 determines the texture of the reprojection area S _{c, t, k} (u, v) in each camera image V _{c, t} (u, v) (c = 1,2, ..., n). the reverse projection plane P _t which is designated by the index _k, backprojection to _k by _{((u, v) c →} (x, y, projection of z)), the back projection plane P _t, the _k After drawing the texture, the process proceeds to step S4. (The drawn range is the cross-sectional area of the alpha value α2 _{t, k} (i, j).)

図９は、当該描画をカメラc=1,2の2台の画像で行う場合の模式例を示す図である。 FIG. 9 is a diagram illustrating a schematic example when the drawing is performed with two images of the cameras c = 1,2.

当該描画の際には、逆投影面P_t,k上の同一位置(i,j)のテクスチャTX_t,k(i,j)が、複数のカメラ画像V_c,t(u,v)（c=1,2,…,n）内の再投影領域S_c,t,k(u,v)の対応位置(u_[i,v],v_[i,v])_cからそれぞれ描画されて得られることとなる。従って、描画部7では、以下の加算式に模式的に示されるように、当該描画に用いる複数のカメラcからの画素をどのように配分してテクスチャTX_t,k(i,j)を得るかを決定したうえで、当該配分に従って描画を行うこととなる。以下の加算式では「E_c」がカメラcの画素「V_c,t(u_[i,v],v_[i,v])」の配分の割合を表す係数である。
TX_t,k(i,j)=Σ_c E_c* V_c,t(u_[i,v],v_[i,v]) At the time of drawing, the texture TX _{t, k} (i, j) at the same position (i, j) on the backprojection plane P _{t, k} is converted into a plurality of camera images V _{c, t} (u, v) ( c = 1,2, ..., n) are drawn from the corresponding positions (u _{[i, v]} , v _{[i, v]} ) _{c of the} reprojection region S _{c, t, k} (u, v) in _c ) Will be obtained. Accordingly, the drawing unit 7 obtains the texture TX _{t, k} (i, j) by distributing the pixels from the plurality of cameras c used for the drawing as schematically shown in the following addition formula. In accordance with the distribution, the drawing is performed. In the following addition formula, “E _c ” is a coefficient representing a distribution ratio of the pixel “V _{c, t} (u _{[i, v]} , v _{[i, v]} )” of the camera c.
TX _{t, k} (i, j) = Σ _c E _c * V _{c, t} (u _{[i, v]} , v _{[i, v]} )

ここで、当該配分して描画する手法に関して、種々の実施形態が可能である。 Here, various embodiments are possible with respect to the technique of distributing and drawing.

第一実施形態では、カメラcの配分係数E_cを、当該カメラcの位置（及び向き）と、指定されている仮想視点CVの位置（及び向き）と、が近いほど、当該係数の値を大きくして重点的に当該近いカメラcの画素が配分されるようにしてよい。なお、カメラの位置（及び向き）の近さの評価は、仮想視点CVに近いカメラc=1,2,…,nを決定した際と同じ評価を用いればよい。 In the first embodiment, the distribution coefficient E _c camera c, the position of the camera c (and orientation), and the position of the virtual viewpoint CV specified (and orientation), the closer the value of the coefficient The pixels of the near camera c may be allocated with a greater focus. Note that the evaluation of the proximity of the position (and orientation) of the camera may be performed using the same evaluation as when the cameras c = 1, 2,..., N close to the virtual viewpoint CV are determined.

第二実施形態では、カメラcの配分係数E_cを、当該描画に用いるカメラcの画素「V_c,t(u_[i,v],v_[i,v])」に対して付与部9から与えられている制御値d_c,t,k-1(u_[i,v],v_[i,v])、すなわち既に描画に利用された回数が多いほど、小さくなるように設定することができる。第二実施形態によれば、ある時点で描画に用いられた画素はその後、描画への影響が小さくなることとなる。特別の場合として、1回でも描画に用いられたのであれば配分係数E_c=0とすることで、その後は描画に利用しないようにフラグ付与することもできる。同様に、所定の上限回数に到達した時点で配分係数E_c=0としてもよい。 In the second embodiment, the assigning unit 9 assigns the distribution coefficient E _c of the camera c to the pixel “V _{c, t} (u _{[i, v]} , v _{[i, v]} )” of the camera c used for the drawing. The control value d _{c, t, k-1} (u _{[i, v]} , v _{[i, v]} ) given from, that is, the smaller the number of times already used for drawing Can do. According to the second embodiment, the pixels used for drawing at a certain point of time have less influence on the drawing thereafter. As a special case, if it has been used for drawing even once, a distribution coefficient E _c = 0 can be set so that the flag is not used thereafter for drawing. Similarly, the distribution coefficient E _c = 0 may be set when a predetermined upper limit number is reached.

なお、第二実施形態では、カメラcの配分係数E_cは、描画先の位置(i,j)に対応する描画元の位置(u_[i,v],v_[i,v])ごとの係数E_c(u_[i,v],v_[i,v])となる。また、k=1の初回の描画においては未描画の状態である（制御値d_c,t,0がステップS1で与えた初期値である）ため、第二実施形態による配分係数E_cは全て等しくなる。 In the second embodiment, the distribution coefficient E _c of the camera c is set for each drawing source position (u _{[i, v]} , v _{[i, v]} ) corresponding to the drawing destination position (i, j). Coefficient E _c (u _{[i, v]} , v _{[i, v]} ). In addition, in the first drawing of k = 1, the drawing state is not yet drawn (the control value d _{c, t, 0} is the initial value given in step S1), so that the distribution coefficient E _c according to the second embodiment is all Will be equal.

当該配分に関する第一実施形態と第二実施形態とは組み合わせることも可能である。なお、配分係数E_cは第一実施形態及び／又は第二実施形態で求めたうえで、全カメラc=1,2,…,nでの総和が1となるように規格化したうえで、描画を行うようにすればよい。 The first embodiment and the second embodiment relating to the distribution can be combined. The distribution coefficient E _c is obtained in the first embodiment and / or the second embodiment, and is normalized so that the sum of all the cameras c = 1, 2,. It is only necessary to perform drawing.

すなわち、当該配分して描画する手法に関して、前述のように配分係数E_cで重みづけ和を取ることに関して具体的に、アルファブレンドにより重ね合わせるようにしてもよい。（なお、各配分係数E_cに関して、0≦E_c≦1の範囲で設定し、テクスチャをTX_t,k(i,j)を得るための係数E_cの総和Σ_c E_cの値が1に正規化されるような重みづけ和がアルファブレンドに相当する。） That is, with respect to the method of distributing and drawing, as described above, it may be specifically overlapped by alpha blending with respect to taking the weighted sum with the distribution coefficient E _c . (Note that for each distribution coefficient E _c , the range of 0 ≦ E _c ≦ 1 is set, and the value of the sum Σ _c E _c of the coefficients E _c for obtaining the texture TX _{t, k} (i, j) is 1. (The weighted sum that is normalized to corresponds to the alpha blend.)

この場合、所定関数gを用いて制御値d_c,t,k-1(u_[i,v],v_[i,v])からアルファ値α_c,t,k-1(u_[i,v],v_[i,v])を以下のように求めたうえで、当該アルファ値α_c,t,k-1(u_[i,v],v_[i,v])を用いてアルファブレンドを行うようにすればよい。所定関数gは配分係数E_cに関する第二実施形態と概ね同様に、制御値d_c,t,k-1(u_[i,v],v_[i,v])すなわち既に描画に用いられた回数が大きいほど透過度を増すような関数を用いればよい。
α_c,t,k-1(u_[i,v],v_[i,v])=g(d_c,t,k-1(u_[i,v],v_[i,v])) In this case, the alpha value α _{c, t, k-1} (u _[i, _v] ) is determined from the control value d _{c, t, k-1} (u _{[i, v]} , v _{[i, v]} ) using a predetermined function g _{. v]} , v _{[i, v]} ) are obtained as follows, and the alpha value α _{c, t, k-1} (u _{[i, v]} , v _{[i, v]} ) is used to determine the alpha What is necessary is just to make it blend. The predetermined function g is the control value d _{c, t, k-1} (u _{[i, v]} , v _{[i, v]} ), that is, already used for drawing, almost the same as in the second embodiment related to the distribution coefficient E _c . A function that increases the transparency as the number of times increases is used.
α _{c, t, k-1} (u _{[i, v]} , v _{[i, v]} ) = g (d _{c, t, k-1} (u _{[i, v]} , v _{[i, v]} ))

ここで、上記求めたアルファ値α_c,t,k-1(u_[i,v],v_[i,v])は全カメラc=1,2,…,nでの総和が1となるように規格化してもよい。 Here, the _calculated alpha value α _{c, t, k-1} (u _{[i, v]} , v _{[i, v]} ) is 1 for all cameras c = 1, 2,..., N. You may standardize as follows.

例えば、図７のカメラc=1,2の例に関して、（配分係数E_cに関する第一実施形態と同様に）カメラからの距離m1,m2をも考慮したうえで、アルファブレンドを行う場合、以下のようにアルファブレンド結果を得ることができる。
TX_t,k(i,j)=α*{1-m1/(m1+m2)}*V_1,t(u_[i,v],v_[i,v])
+[1-m2/(m1+m2)+(1-α)*{1- m1/(m1+m2)}]*V_2,t(u_[i,v],v_[i,v])
ここで、α=α_1,t,k-1(u_[i,v],v_[i,v])、すなわちαはカメラc=1のアルファ値である。 For example, regarding the example of the camera c = 1, 2 in FIG. 7, when alpha blending is performed in consideration of the distances m1, m2 from the camera (similar to the first embodiment regarding the distribution coefficient E _c ), The alpha blend result can be obtained as follows.
TX _{t, k} (i, j) = α * {1-m1 / (m1 + m2)} * V _{1, t} (u _{[i, v]} , v _{[i, v]} )
+ [1-m2 / (m1 + m2) + (1-α) * {1- m1 / (m1 + m2)}] * V _{2, t} (u _{[i, v]} , v _{[i, v]} )
Here, α = α _{1, t, k−1} (u _{[i, v]} , v _{[i, v]} ), that is, α is an alpha value of the camera c = 1.

当該アルファブレンド例は、カメラc=1のアルファ値によって減少した分（カメラc=1のテクスチャ描画が透明化された分）をカメラc=2に分配し、そのテクスチャの不透明度を増すという処理の例となっている。全く同様にして、α、βをカメラc=1,2のアルファ値として、一般には以下の式を用いてもよい。
TX_t,k(i,j)=[α*{1-m1/(m1+m2)} +(1-β)*{1- m2/(m1+m2)}]*V_1,t(u_[i,v],v_[i,v])
+[β*{1-m2/(m1+m2)}+(1-α)*{1- m1/(m1+m2)}]*V_2,t(u_[i,v],v_[i,v])
上記の式において明らかなように、V_1,t(u_[i,v],v_[i,v])及びV_2,t(u_[i,v],v_[i,v])のそれぞれの係数は、カメラc=1,2の区別と、その距離m1,m2と、の両方を考慮した、規格化されたアルファ値となっている。
また、カメラが3個以上の場合も同様に、アルファ値により透明化されたカメラのテクスチャ分をその他のカメラに分配するようにすればよい。 This alpha blending example distributes the amount reduced by the alpha value of camera c = 1 (the amount of transparency of the texture drawing of camera c = 1) to camera c = 2, and increases the opacity of the texture. It is an example. In exactly the same manner, generally, the following equations may be used with α and β as the alpha values of the cameras c = 1 and 2.
TX _{t, k} (i, j) = [α * {1-m1 / (m1 + m2)} + (1-β) * {1- m2 / (m1 + m2)}] * V _{1, t} (u _{[i, v]} , v _{[i, v]} )
+ [β * {1-m2 / (m1 + m2)} + (1-α) * {1- m1 / (m1 + m2)}] * V _{2, t} (u _{[i, v]} , v _{[i , v]} )
As is apparent from the above equation, each of V _{1, t} (u _{[i, v]} , v _{[i, v]} ) and V _{2, t} (u _{[i, v]} , v _{[i, v]} ) Is a normalized alpha value that takes into account both the distinction of cameras c = 1,2 and their distances m1, m2.
Similarly, when there are three or more cameras, the texture of the camera made transparent by the alpha value may be distributed to the other cameras.

＜ステップS4＞
ステップS4では、直近のステップS3での描画部7での描画結果に基づき、付与部9が次(k+1回目)のステップS3で描画部7が用いるための制御値d_c,t,k+1(u,v)を求めてからステップS5へと進む。前述の通り、当該制御値d_c,t,k+1(u,v)は、カメラcの画像V_c,t(u,v)の画素(u,v)が当該時点までに描画に用いられた回数として求めればよい。 <Step S4>
In step S4, based on the drawing result in the drawing unit 7 in the latest step S3, the assigning unit 9 uses the control values d _{c, t, k} for the drawing unit 7 to use in the next (k + 1) step S3. _{After obtaining +1} (u, v), the process proceeds to step S5. As described above, the control value d _{c, t, k + 1} (u, v) is used for drawing by the pixel (u, v) of the image V _{c, t} (u, v) of the camera c up to the time point. What is necessary is just to obtain | require as the number of times obtained.

＜ステップS5＞
ステップS5では全ての逆投影面Pt,k(i,j)についての描画が完了したか否かを判定し、完了していれば、すなわち当該時点でのk=KであればステップS7へと進み、完了していなければ、すなわちk＜KであればステップS6へと進む。 <Step S5>
In step S5, it is determined whether or not drawing has been completed for all backprojection planes Pt, k (i, j), and if completed, that is, if k = K at that time, the process proceeds to step S7. If not completed, that is, if k <K, the process proceeds to step S6.

＜ステップS6＞
ステップS6ではkに次の値k+1を設定して、すなわちkの値を1だけ増分してからステップS2に戻る。 <Step S6>
In step S6, the next value k + 1 is set to k, that is, the value of k is incremented by 1, and the process returns to step S2.

＜ステップS7＞
ステップS7では、以上のK回の繰り返しで全ての逆投影面P_t,k(i,j)(k=1,2,…,K)に関して得られているテクスチャTX_t,k(i,j)（当該テクスチャはすなわちxyz空間内での対象の描画結果となっている）を、描画部7が仮想視点CVの画像平面(u,v)へと投影することで対象（前景）のレンダリング結果を得ると共に、背景を前述の通り公知手法でレンダリングすることにより、合成映像SY_t(u,v)を得て、当該フローは終了する。 <Step S7>
In step S7, the textures TX _{t, k} (i, j) obtained for all backprojection planes P _{t, k} (i, j) (k = 1, 2,..., K) in the above K _iterations. ) (The texture is the result of rendering the target in xyz space), and the rendering unit 7 projects the target (foreground) rendering result onto the image plane (u, v) of the virtual viewpoint CV. And the background is rendered by a known method as described above to obtain a synthesized video SY _t (u, v), and the flow ends.

以上、本発明によれば、インデクスkで指定される順番で逆投影面P_t,k(i,j)に描画することにより、及び、合成映像SY_t(u,v)を得る際のレンダリングに関してアルファ値等によるカメラ間での配分を行うことにより、先行手法としての特許文献１における自由視点映像合成のもつ高速な処理形態（リアルタイム性）を損なわずに、オクルージョンによる品質低下の問題を解決することができる。以下、本発明における説明上の補足を述べる。 As described above, according to the present invention, rendering on the backprojection plane P _{t, k} (i, j) in the order specified by the index k, and rendering for obtaining the composite video SY _t (u, v) By allocating between cameras by alpha value etc., the problem of quality degradation due to occlusion is solved without impairing the high-speed processing form (real-time property) of free viewpoint video composition in Patent Document 1 as a prior method. can do. Hereinafter, supplementary explanations in the present invention will be described.

（１）描画部7による処理（図７のステップS7）では、逆投影面P_t,k上の異なる複数の位置(i,j)（実数i,jによる位置）のテクスチャTX_t,k(i,j)が、合成画像SY_t(u,v)における同一の画素(u,v)（整数）へと逆投影されて対応するものとなっていることもありうる。このような場合に関しては、GPU等の実装に応じた処理で扱うようにすればよい。例えば、当該複数の位置(i,j)（例えば、(0.1, 0.1), (0.11, 0.09), (0.09, 0.11)の3つの互いに近接する位置など）のテクスチャの平均値として求める等してよい。 (1) In the process by the drawing unit 7 (step S7 in FIG. 7), textures TX _{t, k} (at a plurality of different positions (i, j) (positions of real numbers i, j) on the backprojection plane P _{t, k} i, j) may be back-projected to the same pixel (u, v) (integer) in the composite image SY _t (u, v) and correspond to it. In such a case, it may be handled by processing according to the implementation of the GPU or the like. For example, it is obtained as an average value of textures at the plurality of positions (i, j) (for example, three positions (0.1, 0.1), (0.11, 0.09), (0.09, 0.11) close to each other). Good.

（２）本発明の合成装置10は、一般的な構成のコンピュータとして実現可能である。すなわち、CPU（中央演算装置）及びGPU（グラフィック処理装置）、当該CPU等にワークエリアを提供する主記憶装置、ハードディスクやSSDその他で構成可能な補助記憶装置、キーボード、マウス、タッチパネルその他といったユーザからの入力を受け取る入力インタフェース、ネットワークに接続して通信を行うための通信インタフェース、表示を行うディスプレイ、カメラ及びこれらを接続するバスを備えるような、一般的なコンピュータによって合成装置10を構成することができる。また、合成装置10の各部の処理はそれぞれ、当該処理を実行させるプログラムを読み込んで実行するCPU及び／又はGPUによって実現することができるが、任意の一部の処理を別途の専用回路等において実現するようにしてもよい。 (2) The synthesizing device 10 of the present invention can be realized as a computer having a general configuration. That is, from users such as CPU (Central Processing Unit) and GPU (Graphic Processing Unit), main storage that provides work area to the CPU, auxiliary storage that can be configured with hard disk, SSD, etc., keyboard, mouse, touch panel, etc. The compositing apparatus 10 may be configured by a general computer including an input interface for receiving input, a communication interface for communication by connecting to a network, a display for displaying, a camera, and a bus for connecting them. it can. The processing of each unit of the synthesizing apparatus 10 can be realized by a CPU and / or GPU that reads and executes a program for executing the processing, but any part of the processing is realized by a separate dedicated circuit or the like. You may make it do.

10…合成装置、1…校正部、2…抽出部、3…算出部、4…面群設定部、5…順設定部、6…逆投影部、7…描画部、8…再投影部、9…付与部、20…レンダリング部 DESCRIPTION OF SYMBOLS 10 ... Composition apparatus, 1 ... Calibration part, 2 ... Extraction part, 3 ... Calculation part, 4 ... Surface group setting part, 5 ... Order setting part, 6 ... Back projection part, 7 ... Drawing part, 8 ... Reprojection part, 9 ... Granting part, 20 ... Rendering part

Claims

A calculation unit for obtaining a likelihood image of a target area captured from each viewpoint image of the multi-viewpoint image;
A backprojection unit for obtaining backprojection data obtained by backprojecting each of the likelihood images onto a plurality of backprojection planes arranged in a three-dimensional space;
The texture of all or part of the viewpoint images of the multi-viewpoint image is drawn on the backprojection data by sequentially backprojecting the texture onto the plurality of backprojection planes constituting the backprojection data. And a rendering unit that synthesizes the target free viewpoint image by rendering.

The drawing unit considers a history regarding whether or not each pixel of the viewpoint image is already used for the drawing when the texture of the whole or a part of the viewpoint image is drawn by the back projection. The synthesizer according to claim 1.

In the drawing unit, when the texture of all or a part of the viewpoint images of the multi-viewpoint image is backprojected and drawn, the number of times is based on the number of times the pixels of the viewpoint image are already used for the drawing. The composition device according to claim 1, wherein the ratio of pixels used for drawing is reduced for drawing.

The synthesizing apparatus according to claim 3, wherein the drawing unit performs the drawing using an alpha value that increases a degree of transmission as the number of times increases.

In the drawing unit, when the textures of all or part of the viewpoint images are backprojected and drawn, the sum of the alpha values in the texture used to draw the same portion is constant. The composition apparatus according to claim 4, wherein the composition is drawn after normalization.

In the rendering unit, when the back-projected and rendered textures of all or part of the viewpoint images of the multi-viewpoint image, the position of the virtual viewpoint of the free viewpoint image to be synthesized and the position of the viewpoint of each viewpoint image 6. The composition apparatus according to claim 1, wherein drawing is performed according to a difference from the composition.

The composition device according to claim 6, wherein the drawing unit reduces a ratio used for drawing for a viewpoint image having a larger difference.

The composition apparatus according to claim 1, wherein the order is an order corresponding to a positional relationship between a virtual viewpoint of the combined free viewpoint image and the plurality of back projection planes.

9. The synthesizing apparatus according to claim 8, wherein the order is earlier as each of the plurality of back projection planes is closer to the virtual viewpoint of the synthesized free viewpoint image.

A calculation step for obtaining a likelihood image of a target area captured from each viewpoint image of the multi-viewpoint image;
Backprojecting to obtain backprojection data obtained by backprojecting each of the likelihood images onto a plurality of backprojection planes arranged in a three-dimensional space;
The texture of all or part of the viewpoint images of the multi-viewpoint image is drawn on the backprojection data by sequentially backprojecting the texture onto the plurality of backprojection planes constituting the backprojection data. A rendering step of rendering the target free viewpoint image by rendering.

A computer, a synthesizer,
A calculation unit for obtaining a likelihood image of a target area captured from each viewpoint image of the multi-viewpoint image;
A backprojection unit for obtaining backprojection data obtained by backprojecting each of the likelihood images onto a plurality of backprojection planes arranged in a three-dimensional space;
The texture of all or part of the viewpoint images of the multi-viewpoint image is drawn on the backprojection data by sequentially backprojecting the texture onto the plurality of backprojection planes constituting the backprojection data. A program that functions as a synthesizing device that includes a drawing unit that synthesizes the target free viewpoint image by rendering.