JP7536531B2

JP7536531B2 - Image processing device, image processing method, image processing system, and program.

Info

Publication number: JP7536531B2
Application number: JP2020114814A
Authority: JP
Inventors: 直樹梅村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2024-08-20
Anticipated expiration: 2040-07-02
Also published as: JP2022012755A

Description

本発明は、仮想視点画像を生成する際のオブジェクトの三次元モデルへの色付けに関する。 The present invention relates to coloring three-dimensional models of objects when generating virtual viewpoint images.

昨今、複数の実在するカメラ（実カメラ）をそれぞれ異なる位置に設置して多視点で同期撮像し、当該撮像により得られた複数視点画像を用いて、実在しない仮想的なカメラ（仮想カメラ）からの見えを表す仮想視点画像を生成する技術が注目されている。この仮想視点画像によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の映像と比較してユーザに高臨場感を与えられる。 Recently, a technology that has attracted attention involves installing multiple real cameras (real cameras) in different positions, capturing images synchronously from multiple viewpoints, and using the multiple viewpoint images obtained by the capture to generate a virtual viewpoint image that represents the view from a non-existent virtual camera (virtual camera). With this virtual viewpoint image, for example, it is possible to view highlight scenes of a soccer or basketball game from various angles, providing the user with a higher sense of realism than with normal video.

仮想視点画像の生成においては、まず、複数の実カメラで撮像された映像をサーバなどに集約し、該映像内のオブジェクト（被写体）の三次元モデルを生成する。そして、この三次元モデルに対して、指定された仮想視点に基づき色付け（テクスチャの貼り付け）を行い、さらに射影変換などの二次元変換を行って、二次元の仮想視点画像が得られる。上記生成プロセスのうち三次元モデルへの色付けには、視点依存テクスチャマッピングの手法がよく用いられる（非特許文献１）。視点依存テクスチャマッピングでは、オブジェクト上の着目する点の色を決定する際、複数の実カメラそれぞれの撮像画像上の対応点における色を重み付きでブレンドする。こうすることで、各視点に対応する撮像画像間の色の差異、三次元モデル形状、カメラパラメータの誤差などに起因する色変化の不連続性を抑制し、より自然なテクスチャ表現を実現している。 In generating a virtual viewpoint image, first, images captured by multiple real cameras are collected on a server or the like, and a three-dimensional model of the object (subject) in the image is generated. This three-dimensional model is then colored (texture is applied) based on the specified virtual viewpoint, and two-dimensional transformation such as projective transformation is further performed to obtain a two-dimensional virtual viewpoint image. In the above generation process, a viewpoint-dependent texture mapping technique is often used to color the three-dimensional model (Non-Patent Document 1). In viewpoint-dependent texture mapping, when determining the color of a point of interest on an object, the colors of corresponding points on the images captured by multiple real cameras are blended with weighting. In this way, discontinuity in color changes caused by color differences between captured images corresponding to each viewpoint, errors in the three-dimensional model shape, camera parameters, etc. are suppressed, and a more natural texture expression is achieved.

P.E.Debevec，C.J.Taylor，and J.Malik：”Modeling and Rendering Architecture from Photographs：A Hybrid Geometry-and Image-Based Approach，”SIGG RAPH’96，pp.11-20，1996.P.E.Debevec, C.J.Taylor, and J.Malik: “Modeling and Rendering Architecture from Photographs: A Hybrid Geometry-and Image-Based Approach,” SIGG RAPH’96, pp.11-20, 1996.

上記視点依存テクスチャマッピングにおいては、仮想カメラに近い位置に存在する実カメラの撮像画像が用いられる。そのため、仮想カメラとオブジェクトとの間に構造物があると、当該構造物の色がオブジェクトの色付けに使用され、不自然な見た目となることがあった。例えば、サッカーの試合において、仮想カメラに近い位置の実カメラがゴールネット越しに選手を捉えている場合、その３次元モデルにはゴールネットを含むテクスチャが貼り付けられてしまう。このような色付けが施された仮想視点画像は、実際の見た目と異なるだけでなく、選手の動きやディテイルが分かりにくいという問題がある。また、ゴールの内側から選手を見るような位置に仮想カメラが設定された場合には、当該仮想カメラからは見えないはずのゴールネットのテクスチャが選手の三次元モデルに貼り付けられることになり不自然な仮想視点画像となってしまう。 In the viewpoint-dependent texture mapping, an image captured by a real camera located close to the virtual camera is used. Therefore, if there is a structure between the virtual camera and the object, the color of the structure is used to color the object, which can result in an unnatural appearance. For example, in a soccer game, if a real camera located close to the virtual camera captures a player through the goal net, a texture including the goal net is pasted onto the 3D model. A virtual viewpoint image with such coloring not only differs from the actual appearance, but also has the problem that the player's movements and details are difficult to understand. In addition, if a virtual camera is set in a position to view a player from inside the goal, the texture of the goal net, which should not be visible from the virtual camera, is pasted onto the 3D model of the player, resulting in an unnatural virtual viewpoint image.

本開示の技術は、上記課題に鑑みてなされたものであり、その目的は、オブジェクトを遮るような構造物が存在する場合でも、仮想視点画像において当該オブジェクトの色を適切に再現することである。 The technology disclosed herein has been developed in consideration of the above problems, and its purpose is to properly reproduce the color of an object in a virtual viewpoint image even when a structure that blocks the object is present.

本開示に係る画像処理装置は、オブジェクトの三次元形状を表す形状データを用いてレンダリング処理を行って仮想視点に応じた仮想視点画像を生成する画像処理装置であって、前記レンダリング処理における前記オブジェクトの色付けのための処理方法を決定する決定手段と、決定された処理方法に従って、前記オブジェクトの色付けをするレンダリング手段と、を備え、前記形状データは、撮像対象領域の周囲に配置された複数の第１の撮像装置で撮像を行って得られた複数の第１の撮像画像に基づき生成され、前記撮像対象領域には、構造物が存在し、前記決定手段は、前記色付けのための色情報を、前記複数の第１の撮像画像に含まれる撮像画像から取得するか、前記オブジェクトを前記構造物によって遮られることなく撮像できる位置に配置された第２の撮像装置で撮像を行って得られた第２の撮像画像から取得するかを、前記仮想視点、前記構造物及び前記オブジェクトの位置関係に基づいて決定する、ことを特徴とする。 The image processing device according to the present disclosure is an image processing device that performs a rendering process using shape data representing a three-dimensional shape of an object to generate a virtual viewpoint image corresponding to a virtual viewpoint, and is equipped with a determination means for determining a processing method for coloring the object in the rendering process, and a rendering means for coloring the object according to the determined processing method, the shape data being generated based on a plurality of first captured images obtained by capturing images using a plurality of first imaging devices arranged around an imaging target area, a structure being present in the imaging target area, and the determination means determining whether color information for the coloring is to be obtained from captured images included in the plurality of first captured images, or from a second captured image obtained by capturing images using a second imaging device arranged in a position where the object can be captured without being obstructed by the structure, based on the positional relationship between the virtual viewpoint, the structure, and the object.

本開示の技術によれば、オブジェクトを遮るような構造物が存在する場合でも、仮想視点画像において当該オブジェクトの色を適切に再現することができる。 The technology disclosed herein makes it possible to properly reproduce the color of an object in a virtual viewpoint image, even when a structure that blocks the object is present.

画像処理システムの構成の一例を示すブロック図。FIG. 1 is a block diagram showing an example of the configuration of an image processing system. 撮像モジュールの配置例を示す図。FIG. 2 is a diagram showing an example of the arrangement of imaging modules. サーバのハードウェア構成の一例を示す図。FIG. 2 is a diagram illustrating an example of a hardware configuration of a server. サーバの機能構成を示すブロック図。FIG. 2 is a block diagram showing the functional configuration of a server. 仮想カメラの設定例を示す図。FIG. 13 is a diagram showing an example of virtual camera settings. 仮想カメラの撮像範囲を示す図。FIG. 4 is a diagram showing an imaging range of a virtual camera. レンダリング処理の流れを示すフローチャート。11 is a flowchart showing the flow of a rendering process. （ａ）及び（ｂ）は、オブジェクトと各カメラとの位置関係を説明する図。4A and 4B are diagrams for explaining the positional relationship between an object and each camera. （ａ）及び（ｂ）は、レンダリング処理の結果の一例を示す図。6A and 6B are diagrams showing an example of a result of a rendering process. 外カメラの撮像画像による色付けが不能となる仮想カメラの位置を示す図。FIG. 13 is a diagram showing the position of the virtual camera at which coloring using an image captured by an external camera is not possible.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 The following describes an embodiment of the present invention with reference to the drawings. Note that the following embodiment does not limit the present invention, and not all of the combinations of features described in the present embodiment are necessarily essential to the solution of the present invention. Note that the same components are described with the same reference numerals.

［実施形態１］
（基本的なシステム構成）
図１は、本実施形態に係る画像処理システムの構成の一例を示すブロック図である。画像処理システム１００は、撮像モジュール１１０ａ～１１０ｊ及び１２０、スイッチングハブ１１５、サーバ１１６、データベース（ＤＢ）１１７、制御装置１１８、及び表示装置１１９を有する。撮像モジュール１１０ａ～１１０ｊのそれぞれには、カメラ１１１ａ～１１１ｊとカメラアダプタ１１２ａ～１１２ｊが、それぞれ内部配線によって接続されて存在する（ａ～ｊは、ａ,ｂ,ｃ,ｄ,ｅ,ｆ,ｇ,ｈ,Ｉ,ｊの１０個である。）。同様に、撮像モジュール１２０内には、カメラ１２１とカメラアダプタ１２２が内部配線によって接続されて存在する。各撮像モジュール１１０ａ～１１０ｊ及び１２０は、ネットワークケーブルによって伝送を行う。スイッチングハブ（以下、「ＨＵＢ」と表記）１１５は、各ネットワーク装置間のルーティングを行う装置である。撮像モジュール１１０ａ～１１０ｊ及び１２０それぞれは、ネットワークケーブル１１３ａ～１１３ｋでＨＵＢ１１５に接続されている。同様に、サーバ１１６と制御装置１１８もネットワークケーブル１１３ｌと１１３ｎでそれぞれＨＵＢ１１５に接続されている。また、サーバ１１６とＤｂ１１７との間がネットワークケーブル１１３ｍで接続されている。そして、制御装置１１８と表示装置１１９との間は、映像用ケーブル１１４で接続されている。カメラ１１１ａ～１１１ｊ及び１２１は、同期信号に基づいて互いに高精度に同期して撮像を行う。本実施形態において、各撮像モジュール１１０ａ～１１０ｊは、図２（ａ）に示すように、撮像対象領域であるサッカーのフィールドを囲むように設置される。図２（ａ）は、フィールドを真上から見た状態の図であり、各撮像モジュール１１０ａ～１１０ｊは、地上からある一定の同じ高さに設置されているものとする。一方、撮像モジュール１２０は、図２（ｂ）に示すように、その位置が固定された構造物であるサッカーゴールの内側からフィールドの中央（すなわち、選手等のオブジェクトが存在する方向）に向けて設置される。なお、図２（ｂ）は図２（ａ）における一点鎖線で囲まれた部分の拡大図である。以下の説明においては、フィールドの周囲に設置された撮像モジュール１１０ａ～１１０ｊを「外カメラ」と呼び、サッカーゴールの内側に設置された撮像モジュール１２０を「内カメラ」と呼ぶこととする。なお、本実施形態では外カメラ１１０の台数を１０台、内カメラ１２０の台数を１台としているが、あくまでも一例でありこれに限定されない。 [Embodiment 1]
(Basic system configuration)
FIG. 1 is a block diagram showing an example of the configuration of an image processing system according to this embodiment. The image processing system 100 includes imaging modules 110a to 110j and 120, a switching hub 115, a server 116, a database (DB) 117, a control device 118, and a display device 119. In each of the imaging modules 110a to 110j, cameras 111a to 111j and camera adapters 112a to 112j are connected by internal wiring (a to j are ten units: a, b, c, d, e, f, g, h, I, and j). Similarly, in the imaging module 120, a camera 121 and a camera adapter 122 are connected by internal wiring. Each of the imaging modules 110a to 110j and 120 transmits data via a network cable. A switching hub (hereinafter, referred to as "HUB") 115 is a device that performs routing between each network device. The imaging modules 110a to 110j and 120 are connected to the HUB 115 by network cables 113a to 113k. Similarly, the server 116 and the control device 118 are connected to the HUB 115 by network cables 113l and 113n, respectively. The server 116 and the Db 117 are connected by a network cable 113m. The control device 118 and the display device 119 are connected by a video cable 114. The cameras 111a to 111j and 121 capture images in high-precision synchronization with each other based on a synchronization signal. In this embodiment, the imaging modules 110a to 110j are installed so as to surround the soccer field, which is the imaging target area, as shown in FIG. 2(a). FIG. 2(a) is a diagram of the field as viewed from directly above, and the imaging modules 110a to 110j are installed at a certain constant and the same height from the ground. On the other hand, as shown in FIG. 2B, the imaging module 120 is installed from inside the soccer goal, which is a structure whose position is fixed, toward the center of the field (i.e., the direction in which objects such as players are present). Note that FIG. 2B is an enlarged view of the part surrounded by the dashed line in FIG. 2A. In the following description, the imaging modules 110a to 110j installed around the periphery of the field will be referred to as "external cameras", and the imaging module 120 installed inside the soccer goal will be referred to as "internal camera". Note that in this embodiment, the number of external cameras 110 is 10, and the number of internal cameras 120 is 1, but this is merely an example and is not limited to this.

サーバ１１６は、外カメラ１１０から送信されてきた撮像画像の加工、オブジェクトの三次元モデルの生成、生成された三次元モデルへの色付け（或いは「テクスチャの貼り付け」、「テクスチャマッピング」とも呼ばれる）などを行う。本実施形態において三次元モデルの生成対象となるオブジェクトは、選手やボールといった動体オブジェクトである。ここで、「三次元モデル」とは、オブジェクトの三次元形状を表す形状データを意味している。サーバ１１６は、本システムの時刻同期を行うための時刻同期信号を生成するタイムサーバ機能も有している。データベース（以下、「ＤＢ」と表記）１１７は、サーバ１１６で加工した画像データや生成された三次元モデル等のデータを蓄積したり、蓄積されているデータをサーバ１１６に提供したりする。制御装置１１８は、外カメラ１１０、内カメラ１２０、サーバ１１６を制御する情報処理装置である。また、制御装置１１８は、仮想カメラ（仮想視点）の設定にも利用される。表示装置１１９は、制御装置１１８においてユーザが仮想視点を指定するための設定用ユーザインタフェース画面（ＵＩ画面）の表示や、生成された仮想視点画像の閲覧用ＵＩ画面の表示などを行う。 The server 116 processes the captured image sent from the external camera 110, generates a three-dimensional model of the object, and colors the generated three-dimensional model (also called "texture pasting" or "texture mapping"). In this embodiment, the object for which the three-dimensional model is generated is a moving object such as a player or a ball. Here, the "three-dimensional model" means shape data that represents the three-dimensional shape of the object. The server 116 also has a time server function that generates a time synchronization signal for time synchronization of the present system. The database (hereinafter referred to as "DB") 117 accumulates data such as image data processed by the server 116 and the generated three-dimensional model, and provides the accumulated data to the server 116. The control device 118 is an information processing device that controls the external camera 110, the internal camera 120, and the server 116. The control device 118 is also used to set the virtual camera (virtual viewpoint). The display device 119 displays a setting user interface screen (UI screen) for the user to specify a virtual viewpoint in the control device 118, and displays a UI screen for viewing the generated virtual viewpoint image.

（画像処理システムの動作）
次に、画像処理システム１００における大まかな動作を説明する。外カメラ１１０ａにて得られた撮像画像は前景背景分離等の所定の画像処理が施された後、外カメラ１１０ｂに伝送される。同様に外カメラ１１０ｂは、外カメラ１１０ｂにて得た撮像画像を、外カメラ１１０ａから受け取った撮像画像と合わせて外カメラ１１０ｃに伝送する。このような動作を続けることにより、１０セット分の撮像画像（前景画像を含む）が、外カメラ１１０ｊからＨＵＢ１１５に伝わり、その後、サーバ１１６へ伝送される。また、内カメラ１２０にて得られた撮像画像は前景背景分離等の所定の画像処理が施された後、ＨＵＢ１１５を介してサーバ１１６へ伝送される。 (Operation of Image Processing System)
Next, the general operation of the image processing system 100 will be described. The captured image obtained by the external camera 110a is subjected to predetermined image processing such as foreground/background separation, and then transmitted to the external camera 110b. Similarly, the external camera 110b transmits the captured image obtained by the external camera 110b together with the captured image received from the external camera 110a to the external camera 110c. By continuing such an operation, 10 sets of captured images (including a foreground image) are transmitted from the external camera 110j to the HUB 115, and then transmitted to the server 116. In addition, the captured image obtained by the internal camera 120 is subjected to predetermined image processing such as foreground/background separation, and then transmitted to the server 116 via the HUB 115.

サーバ１１６は、外カメラ１１０ｊ及び内カメラ１２０から取得した視点の異なる撮像画像データ（以下、「複数視点画像データ」と呼ぶ。）に基づき、オブジェクトの三次元モデルの生成やレンダリング処理を行う。また、サーバ１１６は、時刻及び同期信号を各外カメラ１１０ａ～１１０ｊ及び内カメラ１２０に対して送信する。時刻と同期信号を受信した各外カメラ１１０ａ～１１０ｊ及び内カメラ１２０は、受信した時刻及び同期信号を用いて撮像を行い、撮像画像のフレーム同期を行う。即ち、各外カメラ１１０及び内カメラ１２０では同じ時刻に同期してフレーム単位で撮像が行われる。 The server 116 generates a three-dimensional model of the object and performs rendering processing based on captured image data from different viewpoints (hereinafter referred to as "multiple viewpoint image data") acquired from the external camera 110j and the internal camera 120. The server 116 also transmits a time and a synchronization signal to each of the external cameras 110a to 110j and the internal camera 120. Upon receiving the time and synchronization signal, each of the external cameras 110a to 110j and the internal camera 120 captures images using the received time and synchronization signal, and performs frame synchronization of the captured images. In other words, each of the external cameras 110 and the internal camera 120 captures images on a frame-by-frame basis, synchronized to the same time.

（サーバのハードウェア構成）
続いて、仮想視点画像の生成を担うサーバ１１６のハードウェア構成について、図３を用いて説明する。なお、ＤＢ１１７、制御装置１１８、カメラアダプタ１１２及び１２２といった各装置も、基本的には同様のハードウェア構成を有している。サーバ１１６は、ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、通信Ｉ／Ｆ２０５及びバス２０６を有する。 (Server hardware configuration)
Next, the hardware configuration of the server 116 that is responsible for generating the virtual viewpoint image will be described with reference to Fig. 3. The DB 117, the control device 118, the camera adapters 112 and 122 each basically have the same hardware configuration. The server 116 has a CPU 201, a ROM 202, a RAM 203, an auxiliary storage device 204, a communication I/F 205, and a bus 206.

ＣＰＵ２０１は、ＲＯＭ２０２やＲＡＭ２０３に格納されているプログラムやデータを用いてサーバ１１６の全体を制御し、後述の図４に示す各機能部を実現する。なお、サーバ１１６がＣＰＵ２０１とは異なる１又は複数の専用のハードウェアを有し、ＣＰＵ２０１による処理の少なくとも一部を専用のハードウェアが実行してもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭ２０２は、変更を必要としないプログラムなどを格納する。ＲＡＭ２０３は、補助記憶装置２０４から提供されるプログラムやデータ、及び通信Ｉ／Ｆ２０５を介して外部から提供されるデータなどを一時記憶する。補助記憶装置２０４は、例えばＨＤＤやＳＳＤ等で構成され、画像データや音声データといった入力データの他、後述の各種処理で参照されるテーブル、各種アプリケーションプログラムなど、様々なデータやプログラムを記憶する。通信Ｉ／Ｆ２０５は、外部装置との通信に用いられる。例えば、サーバ１１６が外部装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆ２０５に接続される。サーバ１１６が外部装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ２０５はアンテナを備える。バス２０６は、上記各部を繋いでデータや信号を伝達する。なお、本実施形態では、制御装置１１８にて設定された仮想カメラ（仮想視点）の位置・姿勢を示す情報（仮想視点情報）に基づく仮想視点画像がサーバ１１６にて生成され、それをユーザは表示装置１１９にて視聴するシステム構成としている。しかし、システム構成はこれに限定されるものではなく、例えば仮想視点情報を入力するためのユーザインタフェース機能と仮想視点画像を視聴するためのユーザインタフェース機能とを兼ね備えた１台の情報処理装置が組み込まれていてもよい。 The CPU 201 uses the programs and data stored in the ROM 202 and the RAM 203 to control the entire server 116 and realizes each functional unit shown in FIG. 4 described later. The server 116 may have one or more dedicated hardware different from the CPU 201, and at least a part of the processing by the CPU 201 may be executed by the dedicated hardware. Examples of the dedicated hardware include an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), and a DSP (digital signal processor). The ROM 202 stores programs that do not require modification. The RAM 203 temporarily stores programs and data provided from the auxiliary storage device 204, and data provided from the outside via the communication I/F 205. The auxiliary storage device 204 is composed of, for example, an HDD or SSD, and stores various data and programs such as input data such as image data and audio data, tables referenced in various processes described later, and various application programs. The communication I/F 205 is used for communication with an external device. For example, when the server 116 is connected to an external device by wire, a communication cable is connected to the communication I/F 205. When the server 116 has a function of wirelessly communicating with an external device, the communication I/F 205 is equipped with an antenna. The bus 206 connects the above-mentioned components to transmit data and signals. In this embodiment, the server 116 generates a virtual viewpoint image based on information (virtual viewpoint information) indicating the position and orientation of a virtual camera (virtual viewpoint) set by the control device 118, and the user views the image on the display device 119. However, the system configuration is not limited to this, and for example, a single information processing device having both a user interface function for inputting virtual viewpoint information and a user interface function for viewing a virtual viewpoint image may be incorporated.

（サーバの機能構成）
図４は、本実施形態に係る、オブジェクトの三次元モデルの生成や仮想視点情報に基づくレンダリング処理を行う画像処理装置としての、サーバ１１６の機能構成を示すブロック図である。サーバ１１６は、三次元モデル生成部４０１、処理方法決定部４０２及びレンダリング部４０３を有する。また、サーバ１１６は、ＣＰＵ２０１がＲＡＭ２０３やＲＯＭ２０２に格納されている所定のプログラムを実行することで図４に示す各部の機能を実現する。なお、本実施形態では、１台のサーバ１１６にて上記各部の機能を実現する構成とするが、複数台のサーバによって上記各部の機能が分散して実現されるような構成でもよい。例えば、三次元モデル生成部４０１の機能を担うサーバと、処理方法決定部４０２及びレンダリング部４０３の２つの機能を担うサーバとに分けてもよい。 (Server Functional Configuration)
4 is a block diagram showing the functional configuration of the server 116 as an image processing device that generates a three-dimensional model of an object and performs rendering processing based on virtual viewpoint information according to this embodiment. The server 116 has a three-dimensional model generation unit 401, a processing method determination unit 402, and a rendering unit 403. The server 116 realizes the functions of each unit shown in FIG. 4 by the CPU 201 executing a predetermined program stored in the RAM 203 or the ROM 202. In this embodiment, the server 116 is configured to realize the functions of each unit described above by one server 116, but the functions of each unit may be realized in a distributed manner by a plurality of servers. For example, the server may be divided into a server that performs the function of the three-dimensional model generation unit 401 and a server that performs two functions, that is, the processing method determination unit 402 and the rendering unit 403.

サーバ１１６に入力された複数視点画像データ（同期撮影による複数の視点に対応した撮像画像）のデータは、まず、三次元モデル生成部４０１に入力される。 The multiple viewpoint image data (images captured corresponding to multiple viewpoints captured by synchronous shooting) input to the server 116 is first input to the three-dimensional model generation unit 401.

三次元モデル生成部４０１は、視点の異なる各撮像画像に対し、まずオブジェクトのシルエット抽出を行なう。そして、得られたシルエット画像（前景画像）と別途入力された各外カメラ１１０のカメラパラメータを用いて、視体積交差法などによってオブジェクトの三次元形状を表す形状データとしての三次元モデルを生成する。カメラパラメータには、外部パラメータ、内部パラメータ、歪曲パラメータが含まれる。内部パラメータとは、焦点距離や、主点と呼ばれる光学的中心を表すパラメータを指す。外部パラメータとは、カメラの位置・視線方向、視線方向を軸とする回転角を表すパラメータである。また、歪曲パラメータとは、レンズの屈折率の違いによって生じる半径方向の画像の歪みや、レンズとイメージプレーンが平行にならないことによって生じる円周方向の歪みを表す係数である。本実施形態では、撮像シーンとして複数の選手が広大なフィールド上を動き回るサッカーの試合を想定している。すなわち、三次元モデル生成部４０１で生成される三次元モデルは、選手やボールのオブジェクトそれぞれの三次元形状を表すデータである。各オブジェクトの三次元モデルにおいては、その三次元形状が、例えばボクセルと呼ばれる単位立方体の集合（塊）として表現される。なお、三次元形状の表現形式には、ボクセル形式の代わりに点群形式やポリゴン形式など他の形式を用いてもよい。 The three-dimensional model generating unit 401 first extracts silhouettes of objects from each captured image with different viewpoints. Then, using the obtained silhouette image (foreground image) and the camera parameters of each external camera 110 input separately, a three-dimensional model is generated as shape data representing the three-dimensional shape of the object by a volume intersection method or the like. The camera parameters include external parameters, internal parameters, and distortion parameters. The internal parameters refer to parameters that represent the focal length and the optical center called the principal point. The external parameters are parameters that represent the position and line of sight of the camera, and the rotation angle around the line of sight. The distortion parameters are coefficients that represent radial image distortion caused by differences in the refractive index of the lens, and circumferential distortion caused by the lens and the image plane not being parallel. In this embodiment, a soccer game in which multiple players move around on a vast field is assumed as the captured scene. In other words, the three-dimensional model generated by the three-dimensional model generating unit 401 is data that represents the three-dimensional shape of each object, such as the player and the ball. In the three-dimensional model of each object, the three-dimensional shape is represented, for example, as a collection (block) of unit cubes called voxels. Note that the three-dimensional shape may be represented in other formats, such as a point cloud format or a polygon format, instead of the voxel format.

処理方法決定部４０２は、仮想カメラの位置・姿勢を示す仮想視点情報、撮像シーン内に存在する構造物の位置情報、選手等のオブジェクトの位置情報に基づき、レンダリング時の処理方法を決定する。具体的には、仮想カメラ、構造物、オブジェクトの相互の位置関係を解析し、三次元モデルへの色付けに、外カメラ１１０の撮像画像の色情報（テクスチャ情報）を使用するか、内カメラ１２０の撮像画像の色情報を使用するかを決定する。「三次元モデルの色付け」とは、三次元モデルを構成するボクセルの三次元位置に基づき撮像画像の画素を選択して仮想視点画像における画素値を決める処理と言い換えることができる。ここで、仮想視点情報は、制御装置１１８において例えばユーザがジョイスティック等を用いて撮像空間内に仮想カメラを設定することで生成され、制御装置１１８からサーバ１１６へと入力されるものとする。撮像シーン内に存在する構造物とは、撮像空間内に設置等されており外力を与えない限り動かない静止物体であり、本実施形態ではゴールネットを含んだサッカーゴールを想定している。構造物の位置は固定されているため、試合開始前などのタイミングで予めその位置座標を取得し補助記憶装置２０４等に保持しておけばよい。オブジェクトの位置情報は、上述の三次元モデルの生成時に生成されたものを三次元モデル生成部４０１から取得する。なお、構造物やオブジェクトの位置情報には、撮像空間における三次元位置を特定する位置座標（世界座標）と、その大きさを特定する情報が含まれる。 The processing method determination unit 402 determines the processing method at the time of rendering based on the virtual viewpoint information indicating the position and attitude of the virtual camera, the position information of the structures existing in the captured scene, and the position information of the objects such as players. Specifically, it analyzes the relative positions of the virtual camera, the structures, and the objects, and determines whether to use the color information (texture information) of the image captured by the external camera 110 or the color information of the image captured by the internal camera 120 for coloring the three-dimensional model. "Coloring the three-dimensional model" can be rephrased as a process of selecting pixels of the captured image based on the three-dimensional positions of the voxels that constitute the three-dimensional model and determining the pixel values in the virtual viewpoint image. Here, the virtual viewpoint information is generated by the user setting a virtual camera in the captured space using a joystick or the like in the control device 118, for example, and is input from the control device 118 to the server 116. The structures existing in the captured scene are stationary objects that are installed in the captured space and do not move unless an external force is applied, and in this embodiment, soccer goals including a goal net are assumed. Since the positions of the structures are fixed, it is sufficient to obtain their position coordinates in advance at a timing such as before the start of the match and store them in the auxiliary storage device 204 or the like. The object position information is generated when the above-mentioned 3D model is generated and is obtained from the 3D model generation unit 401. The position information of the structure or object includes position coordinates (world coordinates) that specify the 3D position in the imaging space and information that specifies the size.

レンダリング部４０３は、処理方法決定部４０２が決定したレンダリング処理方法に従い、三次元モデル生成部４０１が生成した三次元モデルに対し、仮想視点情報に基づき色付け（テクスチャの貼り付け）を行って、仮想視点からの見えを表す画像を生成する。 The rendering unit 403, in accordance with the rendering processing method determined by the processing method determination unit 402, applies color (applies texture) to the three-dimensional model generated by the three-dimensional model generation unit 401 based on the virtual viewpoint information, and generates an image that represents the view from the virtual viewpoint.

（レンダリング処理の詳細）
続いて、本実施形態におけるレンダリング処理について、具体例を示しつつ詳しく説明する。図５は、前述の図２において、サッカーゴール３０１越しにゴールキーパー３０２を捉えるような位置に仮想カメラ３０３を設定した場合のカメラ配置を、フィールド真上からの俯瞰で示した図である。そして、図６は、仮想カメラ３０３からの視界（画角）を示している。図６から明らかなように、仮想カメラ３０３からゴールキーパー３０２を見たとき、サッカーゴール３０１のゴールネット３０１ａが、ゴールキーパー３０２を遮るように存在することになる。このようなケースを想定しつつ、サーバ１１６が有する処理方法決定部４０２及びレンダリング部４０３の動作について、図７に示すフローチャートに沿って説明する。なお、図７のフローチャートに示す一連の処理は、三次元モデル生成部４０１においてオブジェクトの三次元モデルの生成が完了した段階で、ＣＰＵ２０１が補助記憶装置２０４に格納された所定の制御プログラムをＲＡＭ２０３に展開してこれを実行することで実現される。図７のフローに示す各ステップのうち、Ｓ７０７、Ｓ７０８、Ｓ７１０、Ｓ７１１の各処理はレンダリング部４０３が担当し、残りのステップは処理方法決定部４０２が担当する。なお、以下の説明において記号「Ｓ」はステップを表す。 (Details of the rendering process)
Next, the rendering process in this embodiment will be described in detail with a specific example. FIG. 5 is a diagram showing a camera arrangement in FIG. 2, in which the virtual camera 303 is set at a position such that the goalkeeper 302 is captured over the soccer goal 301, as viewed from directly above the field. FIG. 6 shows the field of view (angle of view) from the virtual camera 303. As is clear from FIG. 6, when the goalkeeper 302 is viewed from the virtual camera 303, the goal net 301a of the soccer goal 301 is present so as to block the goalkeeper 302. Assuming such a case, the operation of the processing method determination unit 402 and the rendering unit 403 of the server 116 will be described with reference to the flowchart shown in FIG. 7. Note that the series of processes shown in the flowchart in FIG. 7 are realized by the CPU 201 expanding a predetermined control program stored in the auxiliary storage device 204 into the RAM 203 and executing the program when the generation of the three-dimensional model of the object is completed in the three-dimensional model generation unit 401. 7, the processes of S707, S708, S710, and S711 are performed by the rendering unit 403, and the remaining steps are performed by the processing method determination unit 402. In the following description, the symbol "S" represents a step.

Ｓ７０１では、制御装置１１８から入力された仮想視点情報で特定される仮想カメラの視界（画角）に存在する１以上のオブジェクトの中から、レンダリング処理の対象となる注目するオブジェクトが決定される。 In S701, an object of interest to be subjected to the rendering process is determined from one or more objects present in the field of view (angle of view) of the virtual camera specified by the virtual viewpoint information input from the control device 118.

続くＳ７０２では、仮想カメラの一定範囲内に存在する近傍の外カメラが、外カメラ１１０ａ～１１０ｊの中から決定される。仮想カメラの位置は上述の仮想視点情報によって特定され、外カメラ１１０ａ～１１０ｊそれぞれの位置は上述のカメラパラメータによって特定される。ここでは、設定された仮想カメラに最も近い位置にある１つの外カメラを選択するものとする。具体的には、１０台ある外カメラ１１０ａ～１１０ｊのうち、その視界（画角）が仮想カメラの視界（画角）と重複する割合が最大である外カメラを１つ選択する。例えば、前述の図５に示した位置に設定された仮想カメラ３０３の場合には、外カメラ１１０ｆが選択されることになる。図８（ａ）は、前述の図５に示すカメラ配置における、ゴールキーパー３０２、サッカーゴール３０１、外カメラ１１０ｆ、内カメラ１２０及び仮想カメラ３０３の位置関係を、フィールドの長手方向側面から見た場合の図である。また、図８（ｂ）は、仮想カメラ３０３の光軸方向の延長線上でゴールネット３０１ａの内側の位置に、仮想カメラ３０３’が設定された場合の位置関係を示す図である。図８（ａ）及び（ｂ）いずれのケースにおいても、仮想カメラ３０３及び３０３’の視界と外カメラ１１０ｆの視界とが大きく重なることになる。このようにして、仮想カメラに最も近い位置にある外カメラが決定されるとＳ７０３に進む。なお、本実施形態では最近傍の外カメラを１台だけ決定しているが、これに限定されない。例えば仮想カメラの一定範囲内にある複数の外カメラを決定し、仮想カメラからの距離に応じてそれぞれ重み付けを行う等して以下の処理を行ってもよい。 In the next step S702, a nearby external camera within a certain range of the virtual camera is determined from the external cameras 110a to 110j. The position of the virtual camera is specified by the virtual viewpoint information described above, and the position of each of the external cameras 110a to 110j is specified by the camera parameters described above. Here, it is assumed that one external camera located closest to the set virtual camera is selected. Specifically, one external camera whose field of view (angle of view) overlaps the field of view (angle of view) of the virtual camera at the maximum rate is selected from among the ten external cameras 110a to 110j. For example, in the case of the virtual camera 303 set at the position shown in FIG. 5, the external camera 110f is selected. FIG. 8(a) is a diagram showing the positional relationship of the goalkeeper 302, soccer goal 301, external camera 110f, internal camera 120, and virtual camera 303 in the camera arrangement shown in FIG. 5, as viewed from the longitudinal side of the field. FIG. 8B is a diagram showing the positional relationship when the virtual camera 303' is set at a position inside the goal net 301a on an extension of the optical axis direction of the virtual camera 303. In both cases of FIG. 8A and FIG. 8B, the fields of view of the virtual cameras 303 and 303' largely overlap with the field of view of the external camera 110f. Once the external camera closest to the virtual camera is determined in this manner, the process proceeds to S703. Note that in this embodiment, only one nearest external camera is determined, but this is not limited to this. For example, multiple external cameras within a certain range of the virtual camera may be determined, and the following process may be performed by weighting each of them according to their distance from the virtual camera.

Ｓ７０３では、注目オブジェクトの三次元モデルを構成するボクセルの中から、色付けの対象となる注目するボクセルが決定される。なお、注目するボクセルは、複数のボクセルから成る集合体（注目ボクセル群）であってもよい。次のＳ７０４では、Ｓ７０３で決定した注目ボクセルをＳ７０２で決定した外カメラから見た場合に、当該注目ボクセルを遮蔽する位置に構造物が存在するか否かが判定される。判定の際には、予め決まった位置に配置されている構造物の位置情報と、Ｓ７０２で決定された外カメラの位置情報を、ＲＡＭ２０３或いは補助記憶装置２０４から読み込んで使用する。例えば、上記図８（ａ）及び（ｂ）のケースであれば、ゴールキーパー３０２の三次元モデルにおける注目ボクセルの色付けに、外カメラ１１０ｆの撮像画像における対応する画素位置の色情報を使えるかどうかが判定されることになる。つまり、本ステップにおける判定では、固定された位置の構造物（ここではゴールネット３０１ａ）が注目オブジェクト（ゴールキーパー３０２）の一部でも遮蔽するかどうかを判定していることになる。いま、図８（ａ）及び（ｂ）のどちらのケースでも、仮想カメラに一番近い外カメラは１１０ｆである。従って、図示されるように、ゴールキーパー３０２と外カメラ１１０ｆとの間にはサッカーゴール３０１が存在するので、ゴールネット３０１ａがゴールキーパー３０２の手前に映り込むことになる。つまり、図８（ａ）及び（ｂ）いずれのケースにおいても、ゴールネット３０１ａによってゴールキーパー３０２が遮られる状態にある。よって、図８（ａ）及び（ｂ）のケースでは、注目ボクセルを遮蔽する構造物（以下、「遮蔽物」と表記）が存在すると判定されることになる。判定の結果、注目ボクセルに対し遮蔽物が存在する場合はＳ７０５に進み、存在しない場合はＳ７０９に進む。 In S703, a voxel of interest to be colored is determined from among the voxels constituting the three-dimensional model of the target object. The voxel of interest may be a group of multiple voxels (a group of target voxels). In the next step S704, it is determined whether or not a structure is present at a position that blocks the voxel of interest determined in S703 when the voxel of interest is viewed from the external camera determined in S702. When making the determination, the position information of the structure located at a predetermined position and the position information of the external camera determined in S702 are read from the RAM 203 or the auxiliary storage device 204 and used. For example, in the case of the above-mentioned Figures 8(a) and (b), it is determined whether or not color information of the corresponding pixel position in the image captured by the external camera 110f can be used to color the voxel of interest in the three-dimensional model of the goalkeeper 302. In other words, in this step, it is determined whether a structure at a fixed position (here, the goal net 301a) covers even a part of the target object (goalkeeper 302). Now, in both cases of FIG. 8(a) and (b), the outer camera closest to the virtual camera is 110f. Therefore, as shown in the figure, since the soccer goal 301 exists between the goalkeeper 302 and the outer camera 110f, the goal net 301a is reflected in front of the goalkeeper 302. In other words, in both cases of FIG. 8(a) and (b), the goalkeeper 302 is in a state where it is blocked by the goal net 301a. Therefore, in the cases of FIG. 8(a) and (b), it is determined that a structure (hereinafter, referred to as "blocking object") that blocks the target voxel exists. If the result of the determination is that a blocking object exists for the target voxel, proceed to S705, and if not, proceed to S709.

Ｓ７０４にて遮蔽物が存在すると判定された場合のＳ７０５では、注目ボクセルへの色付けのための色情報を、当該遮蔽物（ここではゴールネット３０１ａ）の内側に設置してある内カメラの撮像画像から取得できるかどうかが判定される。ここで、“色情報を取得できる”とは、遮蔽物を含む他のオブジェクトが映り込むことなく、注目オブジェクト本来の色が撮像画像における対応する画素位置から適切に得られることを意味する。上述の図８（ａ）及び（ｂ）のケースでは、内カメラ１２０とゴールキーパー３０２との間には、他の選手等を含め邪魔になるオブジェクトが存在しない。よって、注目ボクセルを色付けするための色情報を、内カメラ１２０の撮像画像から取得できると判定されることになる。判定の結果、内カメラ１２０の撮像画像から色情報が取得できる場合はＳ７０６に進み、取得できない場合はＳ７１１に進む。Ｓ７０６では、注目ボクセルへの色付けに、内カメラの撮像画像の色情報を用いるかどうかが判定される。この判定は、不図示のＵＩ画面を介したユーザ選択に係らしめてもよいし、一定の条件に従って自動判定するようにしてもよい。一定の条件としては、例えば注目オブジェクトの属性（選手、審判、ボールなどのオブジェクト種別を表す情報）に応じて決定する方法が考えられる。内カメラの撮像画像の色情報を使用して色付けする場合はＳ７０７に進み、そうでない場合は７０８に進む。なお、内カメラの撮像画像の色情報を取得可能（Ｓ７０５でＹｅｓ）と判定された段階でＳ７０６の判定をスキップしてＳ７０７に進むようにしてもよい。 In S705, when it is determined in S704 that an obstruction exists, it is determined whether color information for coloring the target voxel can be obtained from the image captured by the internal camera installed inside the obstruction (here, the goal net 301a). Here, "color information can be obtained" means that the original color of the target object is appropriately obtained from the corresponding pixel position in the captured image without other objects, including the obstruction, being reflected. In the cases of FIG. 8(a) and (b) above, there are no obstructing objects, including other players, between the internal camera 120 and the goalkeeper 302. Therefore, it is determined that color information for coloring the target voxel can be obtained from the image captured by the internal camera 120. If color information can be obtained from the image captured by the internal camera 120 as a result of the determination, proceed to S706, and if not, proceed to S711. In S706, it is determined whether color information from the image captured by the internal camera is used to color the target voxel. This determination may be based on user selection via a UI screen (not shown), or may be determined automatically according to certain conditions. One possible method for determining the conditions is based on the attributes of the object of interest (information indicating the object type, such as player, referee, or ball). If color information from the image captured by the internal camera is used for coloring, the process proceeds to S707, and if not, the process proceeds to S708. Note that when it is determined that color information from the image captured by the internal camera can be acquired (Yes in S705), the process may skip the determination in S706 and proceed to S707.

Ｓ７０７では、注目ボクセルに対して内カメラの撮像画像における対応する画素位置から得られる色情報を使用した色付けが実行される。また、Ｓ７０８では、注目ボクセルに対し、いずれかの外カメラ（例えばＳ７０２で決定した仮想カメラの最近傍の外カメラ）の撮像画像における対応する画素位置から得られる色情報を使用した色付けが実行される。Ｓ７０７又はＳ７０８のレンダリング処理終了後はＳ７１２に進む。 In S707, the voxel of interest is colored using color information obtained from the corresponding pixel position in the image captured by the internal camera. In S708, the voxel of interest is colored using color information obtained from the corresponding pixel position in the image captured by one of the external cameras (for example, the external camera closest to the virtual camera determined in S702). After the rendering process in S707 or S708 is completed, the process proceeds to S712.

Ｓ７０４にて遮蔽物が存在しないと判定された場合のＳ７０９では、注目ボクセルへの色付けのための色情報が、何れかの外カメラ（Ｓ７０２で決定した外カメラを含む）の撮像画像から取得できるかどうかが判定される。フィールド上には、サッカーゴールのような構造物以外にも複数の選手や審判などが存在するので、Ｓ７０４にて遮蔽物が存在しないと判定された場合であっても、注目オブジェクトとの間に別の選手等が存在する場合もあり得る。そこで、この点を加味して、注目ボクセルへの色付けに使用する色情報を外カメラの撮像画像から取得できるかどうかが判定される。具体的には、まず、仮想カメラの位置座標、仮想カメラの視界（画角）内に存在する全オブジェクトの位置座標、全外カメラのカメラパラメータに基づき、仮想カメラ、外カメラ、各オブジェクトの位置関係を割り出す。そして、注目オブジェクトの色情報（ここではゴールキーパー本来の色）を取得可能な、外カメラの撮像画像があるかどうかが判定される。判定の結果、いずれかの外カメラ１１０ａ～１１０ｊの撮像画像から注目オブジェクト本来の色情報を取得可能な場合はＳ７１０に進み、取得できない場合はＳ７１１に進む。 In S709, if it is determined in S704 that no obstruction exists, it is determined whether color information for coloring the target voxel can be obtained from an image captured by any of the external cameras (including the external camera determined in S702). Since there are multiple players and referees on the field in addition to structures such as soccer goals, even if it is determined in S704 that no obstruction exists, there may be other players between the target object and the object. Taking this into consideration, it is determined whether color information to be used for coloring the target voxel can be obtained from an image captured by an external camera. Specifically, first, the positional relationship between the virtual camera, the external camera, and each object is calculated based on the position coordinates of the virtual camera, the position coordinates of all objects present within the field of view (angle of view) of the virtual camera, and the camera parameters of all the external cameras. Then, it is determined whether there is an image captured by an external camera from which color information of the target object (here, the goalkeeper's original color) can be obtained. If the result of the determination is that the original color information of the target object can be obtained from the image captured by any of the external cameras 110a to 110j, the process proceeds to S710; if not, the process proceeds to S711.

Ｓ７１０では、注目ボクセルに対し、Ｓ７０９にて色情報を取得可能と判定された外カメラの撮像画像の対応する画素位置から色情報を取得して色付けが実行される。また、どの実カメラ（全外カメラと内カメラ）の撮像画像からも色情報を取得できない場合のＳ７１１では、注目ボクセルの周辺領域の色情報を用いた補正（補間処理）によって注目ボクセルに対し色付けを行う。もしくは色付け自体を行わずに色情報を持たない透明なボクセルとして処理してもよい。そして、Ｓ７１０又はＳ７１１の色付け処理終了後はＳ７１２に進む。 In S710, coloring is performed on the voxel of interest by acquiring color information from the corresponding pixel position of the image captured by the outer camera for which it was determined in S709 that color information can be acquired. In addition, in S711, if color information cannot be acquired from the image captured by any of the real cameras (all outer cameras and inner camera), the voxel of interest is colored by correction (interpolation process) using color information from the surrounding area of the voxel of interest. Alternatively, the voxel of interest may not be colored at all and may be processed as a transparent voxel with no color information. Then, after the coloring process of S710 or S711 is completed, the process proceeds to S712.

Ｓ７１２では、注目オブジェクトを構成する全てのボクセルに対して色付け処理が完了したか否か（未着色のボクセルの有無）が判定される。全ボクセルに対する色付け処理が完了していればＳ７１３に進み、未処理のボクセルがあればＳ７０３に戻って次の注目ボクセルを決定して処理を続行する。 In S712, it is determined whether or not coloring has been completed for all voxels that make up the target object (whether or not there are any uncolored voxels). If coloring has been completed for all voxels, the process proceeds to S713, and if there are any unprocessed voxels, the process returns to S703 to determine the next target voxel and continue processing.

Ｓ７１３では、仮想カメラの視界（画角）に存在するすべてのオブジェクトについてレンダリング処理が完了したか否かが判定される。完了していれば本処理を終了する。一方、未処理のオブジェクトがあればＳ７０１に戻って次の注目オブジェクトを決定して処理を続行する。 In S713, it is determined whether rendering has been completed for all objects present in the field of view (angle of view) of the virtual camera. If it has been completed, this process ends. On the other hand, if there are unprocessed objects, the process returns to S701 to determine the next object of interest and continue processing.

以上が、本実施形態に係るレンダリング処理の内容である。図９（ａ）及び（ｂ）は、前述の図５及び図６におけるゴールキーパー３０２を注目オブジェクトとして本実施形態のレンダリング処理を行った結果の一例をそれぞれ示している。図９（ａ）は、最終的にＳ７０８にてレンダリング処理された場合の結果を示しており、ゴールキーパー３０２にゴールネット３０４が映り込んでいる。一方、図９（ｂ）は、最終的にＳ７０７にてレンダリング処理された場合の結果を示しており、ゴールキーパー３０２にゴールネット３０４が映り込んでおらず、ゴールキーパー３０２が本来持つ色で色付けされている。このようなレンダリング処理により、三次元モデルに対し適切な色付けが施された仮想視点画像が得られることになる。 The above is the content of the rendering process according to this embodiment. Figures 9(a) and (b) respectively show examples of the results of the rendering process of this embodiment when the goalkeeper 302 in Figures 5 and 6 described above is used as the target object. Figure 9(a) shows the result when the rendering process is finally performed in S708, in which the goal net 304 is reflected in the goalkeeper 302. On the other hand, Figure 9(b) shows the result when the rendering process is finally performed in S707, in which the goal net 304 is not reflected in the goalkeeper 302, and the goalkeeper 302 is colored in its original color. This rendering process results in a virtual viewpoint image in which the three-dimensional model is appropriately colored.

＜変形例＞
なお、図７のフローチャートでは、内カメラの撮像画像から色付けのための色情報を取得するのは、遮蔽物がある場合に限定されている。すなわち、仮想カメラから注目オブジェクトを見たときに遮蔽物がない場合には、外カメラの撮像画像から色情報を優先的に取得して三次元モデルに色付けすることを前提としている。しかしながら、このような構成に限定されるものではない。例えば、注目オブジェクトに対する遮蔽物がない場合でも、内カメラの撮像画像から色情報を取得して、その三次元モデルの色付けを行うような構成でも構わない。さらには、内カメラの撮像画像を、外カメラの撮像画像と共に三次元モデル生成時に用いても構わない。 <Modification>
In the flowchart of FIG. 7, color information for coloring is acquired from the image captured by the internal camera only when there is an obstruction. In other words, it is assumed that when there is no obstruction when the target object is viewed from the virtual camera, color information is preferentially acquired from the image captured by the external camera and the three-dimensional model is colored. However, the present invention is not limited to such a configuration. For example, even when there is no obstruction to the target object, color information may be acquired from the image captured by the internal camera and the three-dimensional model may be colored. Furthermore, the image captured by the internal camera may be used together with the image captured by the external camera when generating the three-dimensional model.

また、仮想カメラ（仮想視点）を設定可能な領域のうち、構造物によって遮蔽される可能性が高い領域を予め算出してその情報を保持しておき、遮蔽物の有無の判定に利用してもよい。図１０は、前述の図５で示した模式図に、外カメラの撮像画像による適切な色付けが不能となる仮想カメラの位置を示す領域を重畳させた図である。図１０において、斜線領域１０００の範囲内で仮想カメラを設定した場合、外カメラ１１０ｅ、１１０ｆ、１１０ｇの撮像画像には常にゴールネットが映り込むことになる。これは、斜線領域１０００内からゴールキーパー３０２を見た場合には、常にゴールネットが邪魔になることを意味し、外カメラ１１０ｅ、１１０ｆ、１１０ｇの撮像画像からはゴールキーパー３０２本来の色情報を取得できないことを意味している。つまり、斜線領域１０００内に仮想カメラを設定した場合において、ゴールキーパー３０２本来の色をその三次元モデルの色付けに使用したい場合は、必然的に内カメラ１２０の撮像画像を使用することになる。そこで、前述の図７のフローチャートのＳ７０４の判定の際、仮想視点情報で特定される仮想カメラが上記斜線領域１０００のような特定の領域内に設定されている場合には常に「遮蔽物あり」と判断してＳ７０５に進むようにしてもよい。 In addition, among the areas where the virtual camera (virtual viewpoint) can be set, the areas that are likely to be blocked by structures may be calculated in advance, the information may be stored, and the information may be used to determine the presence or absence of a blocking object. FIG. 10 is a diagram in which an area indicating the position of the virtual camera where appropriate coloring by the image captured by the outer camera is not possible is superimposed on the schematic diagram shown in FIG. 5 described above. In FIG. 10, if the virtual camera is set within the range of the hatched area 1000, the goal net will always be reflected in the images captured by the outer cameras 110e, 110f, and 110g. This means that when the goalkeeper 302 is viewed from within the hatched area 1000, the goal net will always be in the way, and the original color information of the goalkeeper 302 cannot be obtained from the images captured by the outer cameras 110e, 110f, and 110g. In other words, if a virtual camera is set within the hatched area 1000, if you want to use the original color of the goalkeeper 302 to color the three-dimensional model, you will inevitably use the image captured by the inner camera 120. Therefore, when making the judgment in S704 of the flowchart in FIG. 7 described above, if the virtual camera specified by the virtual viewpoint information is set within a specific area such as the above-mentioned shaded area 1000, it may be determined that there is an obstruction and the process may proceed to S705.

また、本実施形態では、サッカーの試合におけるシュートシーン等をゴールキーパーの背面側から見た場合の仮想視点画像の生成を念頭に、ネット状の構造物が遮蔽物となる例を用いて説明を行ったがこれに限定されない。まず、サッカー同様にゴールネットが使用される競技、例えばハンドボール、フットサル、水球といった競技については、上述の実施形態をそのまま適用することが可能である。また、対戦相手がネットを挟んで存在する競技、例えばバレーボール、テニス、バドミントン、卓球といった競技についても本実施形態を応用することができる。この場合は、ネットを挟んだ両側にそれぞれ自陣側を撮影する内カメラを設置すればよい。また、バスケットボールではゴールネットの他、アクリル製のゴールボードについても遮蔽物として扱うことが可能である。さらには、例えば野球や陸上競技の投てき種目などのように防護ネットが使用される競技にも応用可能である。この場合は、競技領域の内側（防護ネットよりも選手に近い位置）に内カメラを設置すればよい。 In this embodiment, the example of a net-like structure being an obstruction has been described with the generation of a virtual viewpoint image of a shooting scene in a soccer game as viewed from the back side of the goalkeeper in mind, but the present invention is not limited to this. First, the above embodiment can be applied as is to sports in which a goal net is used like soccer, such as handball, futsal, and water polo. This embodiment can also be applied to sports in which opponents are on either side of a net, such as volleyball, tennis, badminton, and table tennis. In this case, it is sufficient to install internal cameras on both sides of the net to capture the player's side. In addition to the goal net in basketball, it is also possible to treat acrylic goal boards as obstructions. Furthermore, it is also possible to apply the present invention to sports in which a protective net is used, such as baseball and throwing events in track and field. In this case, it is sufficient to install internal cameras inside the playing area (closer to the player than the protective net).

以上のとおり本実施形態によれば、オブジェクトの三次元モデルに色付けを行う際に、当該オブジェクトを遮蔽する位置に構造物が存在していても適切に色付けを行うことができる。 As described above, according to this embodiment, when coloring a three-dimensional model of an object, the object can be appropriately colored even if a structure is present in a position that occludes the object.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Other Examples
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

Claims

An image processing device that performs rendering processing using shape data representing a three-dimensional shape of an object to generate a virtual viewpoint image corresponding to a virtual viewpoint,
a determining means for determining a processing method for coloring the object in the rendering process;
a rendering means for coloring the object in accordance with the determined processing method;
Equipped with
the shape data is generated based on a plurality of first captured images obtained by performing imaging using a plurality of first imaging devices arranged around an imaging target area;
a structure is present in the imaging target area,
The determining means is
determining whether color information for the coloring is to be acquired from a captured image included in the plurality of first captured images, or from a second captured image acquired by capturing an image with a second imaging device disposed at a position where the object can be captured without being obstructed by the structure, based on a positional relationship between the virtual viewpoint, the structure, and the object;
13. An image processing device comprising:

The image processing device according to claim 1, characterized in that the determining means determines that, when the object is obstructed by the structure when viewed from a specific imaging device among the plurality of first imaging devices that is located within a certain range from the virtual viewpoint, the color information for the coloring is obtained from the second captured image.

The image processing device according to claim 2, characterized in that the specific imaging device is an imaging device that has a field of view closest to the field of view of the virtual viewpoint among the plurality of first captured images.

The image processing device according to claim 1, characterized in that the determining means determines that, when the virtual viewpoint is between the object and the structure, the color information for the coloring is obtained from a second captured image.

The image processing device according to claim 1, characterized in that the determination means determines that, if the structure is present in the field of view of the virtual viewpoint and is between the object and the virtual viewpoint, the color information for the coloring is obtained from a second captured image.

The determining means is
displaying a user interface screen for allowing a user to select whether or not to acquire color information for the coloring from the second captured image, when color information for the coloring can be acquired from the second captured image;
determining whether color information for the coloring is to be acquired from a captured image included in the plurality of first captured images or from the second captured image based on a user selection via the user interface screen;
6. The image processing device according to claim 1,

The image processing device according to any one of claims 1 to 6, characterized in that the structure has a net-like shape.

a means for storing information on a specific area in which the object is likely to be blocked by the structure, among the areas in which the virtual viewpoint can be set;
When the virtual viewpoint is set within the specific area indicated by the stored information, the determination means determines that color information for the coloring is to be acquired from a second captured image.
2. The image processing device according to claim 1,

the shape data represents a three-dimensional shape of the object in voxels or a point cloud;
The determining means determines a processing method for the coloring in units of the voxels or the point clouds,
The rendering means performs the coloring in units of the voxels or points.
9. The image processing device according to claim 1,

The image processing device according to any one of claims 1 to 9, characterized in that the number of the first imaging devices is greater than the number of the second imaging devices.

An image processing method for generating a virtual viewpoint image according to a virtual viewpoint by performing a rendering process using shape data representing a three-dimensional shape of an object, comprising:
a determining step of determining a processing method for coloring the object in the rendering process;
a rendering step of coloring said object in accordance with the determined processing method;
having
the shape data is generated based on a plurality of first captured images obtained by performing imaging using a plurality of first imaging devices arranged around an imaging target area;
A structure is present in the imaging target area,
In the determining step, whether color information for the coloring is to be obtained from a captured image included in the plurality of first captured images or from a second captured image obtained by capturing an image with a second imaging device disposed at a position where the object can be captured without being obstructed by the structure is determined based on a positional relationship between the virtual viewpoint, the structure, and the object.
13. An image processing method comprising:

A program for causing a computer to function as an image processing device according to any one of claims 1 to 10.