JP7355960B2

JP7355960B2 - System and method for efficient multi-GPU rendering of geometry by geometry analysis during rendering

Info

Publication number: JP7355960B2
Application number: JP2023052155A
Authority: JP
Inventors: イー．サーニーマーク; バーグオフトビアス; シンプソンデイビッド
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2020-02-03
Filing date: 2023-03-28
Publication date: 2023-10-03
Anticipated expiration: 2041-02-01
Also published as: CN115335866A; JP7254252B2; JP2023171822A; JP7530534B2; EP4100923A1; WO2021158468A1; JP2024096226A; JP2023503190A; JP7481560B2; JP2023080128A

Description

本開示は、グラフィック処理に関し、より具体的には、アプリケーション用の画像をレンダリングするときのマルチＧＰＵ連携に関する。 The present disclosure relates to graphic processing, and more specifically to multi-GPU cooperation when rendering images for applications.

近年、クラウドゲームサーバとネットワークを介して接続されたクライアントとの間でストリーミング形式のオンラインまたはクラウドゲームを可能にするオンラインサービスが継続的に推進されている。ストリーミング形式は、オンデマンドのゲームタイトルの利用可能性、より複雑なゲームを実行する能力、マルチプレイヤーゲームのためのプレイヤー間のネットワーク機能、プレイヤー間のアセットの共有、プレイヤー及び／または観客間のインスタントエクスペリエンスの共有、友人がフレンドプレイビデオゲームを見ることを可能にする、友人を友人の進行中のゲームプレイに参加させるなどにより、いっそう人気が高まっている。 In recent years, online services that enable online or cloud gaming in a streaming format between a cloud gaming server and a client connected via a network have been continuously promoted. Streaming formats provide the availability of on-demand game titles, the ability to run more complex games, the ability to network between players for multiplayer games, the ability to share assets between players, and the ability to instantly communicate between players and/or spectators. It has become even more popular due to the ability to share experiences, allow friends to watch friend-play video games, and allow friends to participate in their friend's ongoing gameplay.

クラウドゲームサーバは、１つまたは複数のクライアント及び／またはアプリケーションにリソースを提供するように構成することができる。すなわち、クラウドゲームサーバは、高スループットが可能なリソースで構成され得る。例えば、個々のグラフィック処理ユニット（ＧＰＵ）が達成できるパフォーマンスには限界がある。シーンを生成するときに、さらに複雑なシーンをレンダリングしたり、さらに複雑なアルゴリズム（例えば、マテリアル、ライティング）を使用したりするには、複数のＧＰＵを使用して単一の画像をレンダリングすることが望ましい場合がある。しかしながら、これらのグラフィック処理ユニットを均等に使用することは達成困難である。さらに、従来のテクノロジを使用してアプリケーション用の画像を処理するために複数のＧＰＵが存在する場合でも、対応するスクリーンピクセル数とジオメトリ密度の両方の増加をサポートする能力はない（例えば、４つのＧＰＵにより画像に対して４倍のピクセルを書き込むこと及び／または４倍の頂点またはプリミティブを処理することは不可能である）。 A cloud gaming server may be configured to provide resources to one or more clients and/or applications. That is, the cloud game server may be configured with resources capable of high throughput. For example, there are limits to the performance that individual graphics processing units (GPUs) can achieve. To render more complex scenes or use more complex algorithms (e.g. materials, lighting) when generating scenes, use multiple GPUs to render a single image. may be desirable. However, even utilization of these graphics processing units is difficult to achieve. Additionally, even if multiple GPUs are present to process images for an application using traditional technology, there is no ability to support a corresponding increase in both the number of screen pixels and the geometry density (e.g., four It is not possible for a GPU to write four times as many pixels and/or process four times as many vertices or primitives to an image).

本開示の実施形態は、このような背景の下になされたものである。 The embodiments of the present disclosure have been made against this background.

本開示の実施形態は、レンダリング中にジオメトリ解析を実行して、画像フレームのレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成することによる、及び／またはレンダリングの前にジオメトリ解析を実行することによる、及び／またはレンダリングフェーズ中にＧＰＵのレスポンシビリティの割り当てを再分散するために、レンダリングフェーズ中にタイミング解析を実行することによる、アプリケーション用のジオメトリのマルチＧＰＵレンダリングなど、複数のＧＰＵ（グラフィック処理ユニット）を連携して使用して単一の画像をレンダリングすることに関する。 Embodiments of the present disclosure may be performed by performing geometry analysis during rendering to generate information used for dynamic allocation of screen space to the GPU for rendering of image frames, and/or prior to rendering. multi-GPU rendering of geometry for applications, etc. by performing geometry analysis during the rendering phase and/or by performing timing analysis during the rendering phase to redistribute GPU responsiveness allocation during the rendering phase. , relates to using multiple GPUs (graphics processing units) in conjunction to render a single image.

本開示の実施形態は、グラフィック処理のための方法を開示する。方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。方法は、複数のジオメトリのピースを含む画像フレームをレンダリングするために複数のＧＰＵを連携して使用することを含む。方法は、レンダリングのプレパスフェーズ中に、複数のジオメトリのピース及び複数のスクリーン領域に対するそれらの関係に関する情報をＧＰＵで生成することを含む。方法は、レンダリングの後続のフェーズで複数のジオメトリのピースをレンダリングするために、情報に基づいて複数のスクリーン領域を複数のＧＰＵに割り当てることを含む。 Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method includes using multiple GPUs in conjunction to render an image frame that includes multiple pieces of geometry. The method includes generating information about a plurality of geometry pieces and their relationships to a plurality of screen regions on a GPU during a pre-pass phase of rendering. The method includes allocating screen areas to GPUs based on the information for rendering the pieces of geometry in a subsequent phase of rendering.

本開示の他の実施形態では、プロセッサと、プロセッサに結合されたメモリとを含むコンピュータシステムが開示され、メモリは命令を格納しており、命令は、コンピュータシステムにより実行されると、グラフィック処理のための方法をコンピュータシステムに実行させる。方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。方法は、複数のジオメトリのピースを含む画像フレームをレンダリングするために複数のＧＰＵを連携して使用することを含む。方法は、レンダリングのプレパスフェーズ中に、複数のジオメトリのピース及び複数のスクリーン領域に対するそれらの関係に関する情報をＧＰＵで生成することを含む。方法は、レンダリングの後続のフェーズで複数のジオメトリのピースをレンダリングするために、情報に基づいて複数のスクリーン領域を複数のＧＰＵに割り当てることを含む。 In other embodiments of the present disclosure, a computer system is disclosed that includes a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the computer system, perform graphics processing. cause a computer system to execute a method for The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method includes using multiple GPUs in conjunction to render an image frame that includes multiple pieces of geometry. The method includes generating information about a plurality of geometry pieces and their relationships to a plurality of screen regions on a GPU during a pre-pass phase of rendering. The method includes allocating screen areas to GPUs based on the information for rendering the pieces of geometry in a subsequent phase of rendering.

本開示のさらに他の実施形態は、グラフィック処理のためのコンピュータプログラムを格納する非一時的コンピュータ可読媒体を開示する。コンピュータ可読媒体は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、複数のジオメトリのピースを含む画像フレームをレンダリングするために複数のＧＰＵを連携して使用するためのプログラム命令を含む。コンピュータ可読媒体は、レンダリングのプレパスフェーズ中に、複数のジオメトリのピース及び複数のスクリーン領域に対するそれらの関係に関する情報をＧＰＵで生成するためのプログラム命令を含む。コンピュータ可読媒体は、レンダリングの後続のフェーズで複数のジオメトリのピースをレンダリングするために、情報に基づいて複数のスクリーン領域を複数のＧＰＵに割り当てるためのプログラム命令を含む。 Still other embodiments of the present disclosure disclose a non-transitory computer readable medium storing a computer program for graphics processing. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer-readable medium includes program instructions for cooperatively using multiple GPUs to render an image frame that includes multiple pieces of geometry. The computer-readable medium includes program instructions for generating information about a plurality of geometry pieces and their relationships to a plurality of screen regions on a GPU during a pre-pass phase of rendering. The computer-readable medium includes program instructions for allocating screen areas to GPUs based on the information for rendering the pieces of geometry in a subsequent phase of rendering.

本開示の実施形態は、グラフィック処理のための方法を開示する。方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。方法は、複数のＧＰＵ間でのレンダリングの解析プレパスフェーズ中に画像フレームの複数のジオメトリのピースを処理するレスポンシビリティを分割することを含み、複数のジオメトリのピースのそれぞれが、対応するＧＰＵに割り当てられる。方法は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップを解析プレパスフェーズにおいて決定することを含む。方法は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップに基づいて、複数のジオメトリのピースおよび複数のスクリーン領域に対するそれらの関係に関する情報を複数のＧＰＵで生成することを含む。方法は、レンダリングの後続のフェーズ中に複数のジオメトリのピースをレンダリングするために、情報に基づいて複数のスクリーン領域を複数のＧＰＵに割り当てることを含む。 Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method includes partitioning the responsiveness of processing multiple geometry pieces of an image frame during an analysis pre-pass phase of rendering among multiple GPUs, each of the multiple geometry pieces being assigned to a corresponding GPU. It will be done. The method includes determining an overlap of each of the plurality of geometry pieces with each of the plurality of screen regions in an analysis pre-pass phase. The method includes generating information about the plurality of geometry pieces and their relationships to the plurality of screen regions at the plurality of GPUs based on the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions. including. The method includes allocating screen areas to GPUs based on the information for rendering the pieces of geometry during a subsequent phase of rendering.

本開示の他の実施形態では、プロセッサと、プロセッサに結合されたメモリとを含むコンピュータシステムが開示され、メモリは命令を格納しており、命令は、コンピュータシステムにより実行されると、グラフィック処理のための方法をコンピュータシステムに実行させる。方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。方法は、複数のＧＰＵ間でのレンダリングの解析プレパスフェーズ中に画像フレームの複数のジオメトリのピースを処理するレスポンシビリティを分割することを含み、複数のジオメトリのピースのそれぞれが、対応するＧＰＵに割り当てられる。方法は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップを解析プレパスフェーズにおいて決定することを含む。方法は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップに基づいて、複数のジオメトリのピースおよび複数のスクリーン領域に対するそれらの関係に関する情報を複数のＧＰＵで生成することを含む。方法は、レンダリングの後続のフェーズ中に複数のジオメトリのピースをレンダリングするために、情報に基づいて複数のスクリーン領域を複数のＧＰＵに割り当てることを含む。 In other embodiments of the present disclosure, a computer system is disclosed that includes a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the computer system, perform graphics processing. cause a computer system to execute a method for The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method includes partitioning the responsiveness of processing multiple geometry pieces of an image frame during an analysis pre-pass phase of rendering among multiple GPUs, each of the multiple geometry pieces being assigned to a corresponding GPU. It will be done. The method includes determining an overlap of each of the plurality of geometry pieces with each of the plurality of screen regions in an analysis pre-pass phase. The method includes generating information about the plurality of geometry pieces and their relationships to the plurality of screen regions at the plurality of GPUs based on the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions. including. The method includes allocating screen areas to GPUs based on the information for rendering the pieces of geometry during a subsequent phase of rendering.

本開示のさらに他の実施形態は、グラフィック処理のためのコンピュータプログラムを格納する非一時的コンピュータ可読媒体を開示する。コンピュータ可読媒体は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、複数のＧＰＵ間でのレンダリングの解析プレパスフェーズ中に画像フレームの複数のジオメトリのピースを処理するレスポンシビリティを分割するためのプログラム命令を含み、複数のジオメトリのピースのそれぞれが、対応するＧＰＵに割り当てられる。コンピュータ可読媒体は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップを解析プレパスフェーズにおいて決定するためのプログラム命令を含む。コンピュータ可読媒体は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップに基づいて、複数のジオメトリのピースおよび複数のスクリーン領域に対するそれらの関係に関する情報を複数のＧＰＵで生成するためのプログラム命令を含む。コンピュータ可読媒体は、レンダリングの後続のフェーズ中に複数のジオメトリのピースをレンダリングするために、情報に基づいて複数のスクリーン領域を複数のＧＰＵに割り当てるためのプログラム命令を含む。 Still other embodiments of the present disclosure disclose a non-transitory computer readable medium storing a computer program for graphics processing. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer-readable medium includes program instructions for partitioning a responsibility for processing multiple pieces of geometry of an image frame during an analysis pre-pass phase of rendering among multiple GPUs, each of the multiple pieces of geometry comprising: Assigned to the corresponding GPU. The computer readable medium includes program instructions for determining an overlap of each of the plurality of geometry pieces with each of the plurality of screen regions in an analysis pre-pass phase. The computer-readable medium generates information about the plurality of geometry pieces and their relationships to the plurality of screen regions at the plurality of GPUs based on the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions. Contains program instructions for The computer-readable medium includes program instructions for allocating screen areas to GPUs based on the information for rendering the pieces of geometry during a subsequent phase of rendering.

本開示の実施形態は、グラフィック処理のための方法を開示する。方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。方法は、複数のジオメトリのピースを含む画像フレームをレンダリングするために複数のＧＰＵを連携して使用することを含む。方法は、画像フレームのレンダリング中に、複数のジオメトリのピースのうちの１つまたは複数をより小さなピースに再分割し、これらのジオメトリのより小さな部分をレンダリングするレスポンシビリティを複数のＧＰＵ間で分割することを含み、ジオメトリのより小さな部分のそれぞれは、対応するＧＰＵによって処理される。方法は、再分割されていないジオメトリのピースについて、ジオメトリのピースをレンダリングするレスポンシビリティを複数のＧＰＵ間で分割することを含み、これらのジオメトリのピースのそれぞれが、対応するＧＰＵによって処理される。 Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method includes using multiple GPUs in conjunction to render an image frame that includes multiple pieces of geometry. The method subdivides one or more of the pieces of geometry into smaller pieces during rendering of an image frame, and divides the responsiveness of rendering smaller parts of these geometries between multiple GPUs. each smaller portion of the geometry is processed by a corresponding GPU. The method includes, for pieces of geometry that are not subdivided, dividing the responsibility of rendering the pieces of geometry among a plurality of GPUs, each of these pieces of geometry being processed by a corresponding GPU.

本開示の他の実施形態では、プロセッサと、プロセッサに結合されたメモリとを含むコンピュータシステムが開示され、メモリは命令を格納しており、命令は、コンピュータシステムにより実行されると、グラフィック処理のための方法をコンピュータシステムに実行させる。方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。方法は、複数のジオメトリのピースを含む画像フレームをレンダリングするために複数のＧＰＵを連携して使用することを含む。方法は、画像フレームのレンダリング中に、複数のジオメトリのピースのうちの１つまたは複数をより小さなピースに再分割し、これらのジオメトリのより小さな部分をレンダリングするレスポンシビリティを複数のＧＰＵ間で分割することを含み、ジオメトリのより小さな部分のそれぞれは、対応するＧＰＵによって処理される。方法は、再分割されていないジオメトリのピースについて、ジオメトリのピースをレンダリングするレスポンシビリティを複数のＧＰＵ間で分割することを含み、これらのジオメトリのピースのそれぞれが、対応するＧＰＵによって処理される。 In other embodiments of the present disclosure, a computer system is disclosed that includes a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the computer system, perform graphics processing. cause a computer system to execute a method for The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method includes using multiple GPUs in conjunction to render an image frame that includes multiple pieces of geometry. The method subdivides one or more of the pieces of geometry into smaller pieces during rendering of an image frame, and divides the responsiveness of rendering smaller parts of these geometries between multiple GPUs. each smaller portion of the geometry is processed by a corresponding GPU. The method includes, for pieces of geometry that are not subdivided, dividing the responsibility of rendering the pieces of geometry among a plurality of GPUs, each of these pieces of geometry being processed by a corresponding GPU.

本開示のさらに他の実施形態は、グラフィック処理のためのコンピュータプログラムを格納する非一時的コンピュータ可読媒体を開示する。コンピュータ可読媒体は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、複数のジオメトリのピースを含む画像フレームをレンダリングするために複数のＧＰＵを連携して使用するためのプログラム命令を含む。コンピュータ可読媒体は、画像フレームのレンダリング中に、複数のジオメトリのピースのうちの１つまたは複数をより小さなピースに再分割し、これらのジオメトリのより小さな部分をレンダリングするレスポンシビリティを複数のＧＰＵ間で分割するためのプログラム命令を含み、ジオメトリのより小さな部分のそれぞれは、対応するＧＰＵによって処理される。コンピュータ可読媒体は、再分割されていないジオメトリのピースについて、ジオメトリのピースをレンダリングするレスポンシビリティを複数のＧＰＵ間で分割するためのプログラム命令を含み、これらのジオメトリのピースのそれぞれが、対応するＧＰＵによって処理される。 Still other embodiments of the present disclosure disclose a non-transitory computer-readable medium storing a computer program for graphics processing. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer-readable medium includes program instructions for cooperatively using multiple GPUs to render an image frame that includes multiple pieces of geometry. The computer-readable medium subdivides one or more of the plurality of pieces of geometry into smaller pieces during rendering of an image frame, and provides responsibility for rendering the smaller portions of these geometries between the plurality of GPUs. , and each smaller part of the geometry is processed by a corresponding GPU. The computer-readable medium includes program instructions for partitioning, for pieces of geometry that are not subdivided, the responsibility of rendering the pieces of geometry among multiple GPUs, each of these pieces of geometry Processed by

本開示の他の態様は、本開示の原理の例として示される添付図面と併せて、下記の発明を実施するための形態から明らかになるであろう。 Other aspects of the disclosure will be apparent from the following detailed description, taken in conjunction with the accompanying drawings, which serve as examples of the principles of the disclosure.

本開示は、添付図面と併せて、以下の詳細な説明を参照することにより、最も良く理解することができる。 The present disclosure can be best understood by reference to the following detailed description, taken in conjunction with the accompanying drawings.

本開示の実施形態による、レンダリング中にジオメトリ解析を実行して、画像フレームのさらなるレンダリングパスのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成することによる、及び／またはレンダリングフェーズ前にジオメトリ解析を実行することによる、及び／またはジオメトリのピースを再分割し、結果として得られるジオメトリのより小さな部分を複数のＧＰＵに割り当てることによる、アプリケーション用のジオメトリのマルチＧＰＵ（グラフィック処理ユニット）レンダリングを含んで、単一の画像をレンダリングするために連携して複数のＧＰＵを実装するように構成された１つまたは複数のクラウドゲームサーバ間でネットワークを介してゲームを提供するためのシステムの図である。and/or by performing geometry analysis during rendering to generate information used for dynamic allocation of screen area to the GPU for further rendering passes of the image frame, according to embodiments of the present disclosure. Multi-GPU (graphics processing) of geometry for applications by performing geometry analysis before phasing and/or by subdividing pieces of geometry and assigning smaller parts of the resulting geometry to multiple GPUs. unit) for serving games over a network between one or more cloud game servers configured to implement multiple GPUs working together to render a single image, including rendering FIG. 2 is a diagram of the system. 本開示の一実施形態による、複数のＧＰＵが連携して単一の画像をレンダリングする、マルチＧＰＵアーキテクチャの図である。1 is a diagram of a multi-GPU architecture in which multiple GPUs cooperate to render a single image, according to an embodiment of the present disclosure. FIG. 本開示の実施形態による、レンダリング中にジオメトリ解析を実行することによる、及び／またはレンダリングの前にジオメトリ解析を実行することによる、及び／またはジオメトリのピースを再分割し、結果として得られるジオメトリのより小さな部分を複数のＧＰＵに割り当てることによる、アプリケーション用のジオメトリのマルチＧＰＵレンダリングのために構成された、複数のグラフィック処理ユニットリソースの図である。By performing geometry analysis during rendering, and/or by performing geometry analysis before rendering, and/or by subdividing pieces of geometry and resolving the resulting geometry, according to embodiments of the present disclosure. FIG. 2 is an illustration of multiple graphics processing unit resources configured for multi-GPU rendering of geometry for an application by allocating smaller portions to multiple GPUs. 本開示の一実施形態による、複数のＧＰＵが連携して単一の画像をレンダリングするように、マルチＧＰＵ処理用に構成されたグラフィックパイプラインを実装する、レンダリングアーキテクチャの図である。1 is a diagram of a rendering architecture that implements a graphics pipeline configured for multi-GPU processing, such that multiple GPUs work together to render a single image, according to an embodiment of the present disclosure. FIG. 本開示の一実施形態による、マルチＧＰＵレンダリングを実行するときに象限に再分割されるスクリーンの図である。FIG. 2 is an illustration of a screen subdivided into quadrants when performing multi-GPU rendering, according to an embodiment of the present disclosure. 本開示の一実施形態による、マルチＧＰＵレンダリングを実行するときに複数のインターリーブされた領域に再分割されるスクリーンの図である。FIG. 3 is an illustration of a screen that is subdivided into multiple interleaved regions when performing multi-GPU rendering, according to an embodiment of the present disclosure. 本開示の一実施形態による、複数のＧＰＵが連携して単一の画像をレンダリングするときのスクリーン領域に対するオブジェクトテストを示す。3 illustrates object testing for screen regions when multiple GPUs collaborate to render a single image, according to an embodiment of the present disclosure; FIG. 本開示の一実施形態による、複数のＧＰＵが連携して単一の画像をレンダリングするときのスクリーン領域に対するオブジェクトの一部のテストを示す。2 illustrates testing a portion of an object against a screen area when multiple GPUs collaborate to render a single image, according to an embodiment of the present disclosure; FIG. 本開示の一実施形態による、レンダリング中にジオメトリ解析を実行することによる、アプリケーション用のジオメトリのマルチＧＰＵレンダリングを含むグラフィック処理の方法を示す流れ図である。2 is a flowchart illustrating a method of graphics processing including multi-GPU rendering of geometry for an application by performing geometry analysis during rendering, according to an embodiment of the present disclosure. 本開示の一実施形態による、現在の画像フレームのレンダリング中に実行される現在の画像フレームのジオメトリの解析に基づくジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てを示すスクリーンの図である。FIG. 3 is a screen diagram illustrating dynamic allocation of screen area to a GPU for geometry rendering based on an analysis of the geometry of a current image frame performed during rendering of the current image frame, according to an embodiment of the present disclosure; . 本開示の一実施形態による、画像フレームをレンダリングするＺプレパスフェーズ及びジオメトリフェーズを含む４つのオブジェクトを含む画像フレームのレンダリングを示す図であり、Ｚプレパスフェーズは、スクリーン領域の動的割り当てに使用される画像フレームのジオメトリレンダリングのためのＧＰＵへの情報を生成するために実行される。FIG. 6 illustrates rendering of an image frame including four objects including a Z pre-pass phase and a geometry phase for rendering the image frame, where the Z pre-pass phase is used for dynamic allocation of screen area, according to an embodiment of the present disclosure; is executed to generate information to the GPU for geometric rendering of the image frame. 本開示の一実施形態による、画像フレームをレンダリングするＺプレパスフェーズ及びジオメトリフェーズを含む４つのオブジェクトを含む画像フレームのレンダリングを示す図であり、Ｚプレパスフェーズは、スクリーン領域の動的割り当てに使用される画像フレームのジオメトリレンダリングのためのＧＰＵへの情報を生成するために実行される。FIG. 6 is a diagram illustrating rendering of an image frame including four objects including a Z pre-pass phase and a geometry phase for rendering the image frame, where the Z pre-pass phase is used for dynamic allocation of screen area, according to an embodiment of the present disclosure; is executed to generate information to the GPU for geometric rendering of the image frame. 本開示の一実施形態による、画像フレームをレンダリングするＺプレパスフェーズ及びジオメトリフェーズを含む４つのオブジェクトを含む画像フレームのレンダリングを示す図であり、Ｚプレパスフェーズは、スクリーン領域の動的割り当てに使用される画像フレームのジオメトリレンダリングのためのＧＰＵへの情報を生成するために実行される。FIG. 6 illustrates rendering of an image frame including four objects including a Z pre-pass phase and a geometry phase for rendering the image frame, where the Z pre-pass phase is used for dynamic allocation of screen area, according to an embodiment of the present disclosure; is executed to generate information to the GPU for geometric rendering of the image frame. 本開示の一実施形態による、画像フレームをレンダリングしている間にレンダリングのＺプレパスフェーズ中に実行された現在の画像フレームのジオメトリの解析に基づいてジオメトリレンダリングのために、オブジェクト全体またはオブジェクトの一部に基づいたスクリーン領域のＧＰＵへの動的割り当てを使用して画像フレームをレンダリングすることを示している。While rendering an image frame, an entire object or a portion of an object is used for geometry rendering based on an analysis of the geometry of the current image frame performed during the Z pre-pass phase of rendering, according to an embodiment of the present disclosure. 3 illustrates rendering image frames using dynamic allocation of screen area to GPUs based on screen area. 本開示の一実施形態による、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成するために、レンダリングのＺプレパスフェーズを実行するために、画像フレームのジオメトリのピースへのＧＰＵ割り当てをインターリーブすることを示す図である。of an image frame to perform a Z pre-pass phase of rendering to generate information used for dynamic allocation of screen space to the GPU for geometry rendering of the image frame according to an embodiment of the present disclosure. FIG. 3 illustrates interleaving GPU assignments to pieces of geometry. 本開示の一実施形態による、レンダリングの前にジオメトリ解析を実行することによる、アプリケーション用のジオメトリのマルチＧＰＵレンダリングを含むグラフィック処理の方法を示す流れ図である。2 is a flowchart illustrating a method of graphics processing including multi-GPU rendering of geometry for an application by performing geometry analysis before rendering, according to an embodiment of the present disclosure. 本開示の一実施形態による、画像フレームのレンダリングフェーズの前に実行される解析プレパスを示す図であり、解析プレパスは、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成する。FIG. 3 is a diagram illustrating an analysis pre-pass performed before the image frame rendering phase, in accordance with an embodiment of the present disclosure, where the analysis pre-pass is used to dynamically allocate screen space to the GPU for geometry rendering of the image frame; Generate information to be used. 本開示の一実施形態による、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成するために、解析プレパスを実行するときの、ジオメトリのピースとスクリーン領域の間の正確なオーバーラップの計算を示す図である。Pieces of geometry and screen area when performing an analysis pre-pass to generate information used for dynamic allocation of screen area to GPUs for geometry rendering of image frames, according to an embodiment of the present disclosure. FIG. 3 illustrates calculation of exact overlap between. 本開示の一実施形態による、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成するために、解析プレパスを実行するときの、ジオメトリのピースとスクリーン領域の間の概算のオーバーラップの計算を示す一対の図である。Pieces of geometry and screen space when performing an analysis pre-pass to generate information used for dynamic allocation of screen space to GPUs for geometry rendering of image frames, according to an embodiment of the present disclosure. FIG. 3 is a pair of diagrams illustrating calculation of approximate overlap between. 本開示の一実施形態による、ジオメトリのピースに対してＺプレパスフェーズを実行して画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てるために使用される情報を生成するときなど、レンダリングまたは解析フェーズ中にＧＰＵレスポンシビリティの割り当てを再分散するために、レンダリングまたは解析フェーズ中にタイミング解析を実行することによる、アプリケーション用のジオメトリのマルチＧＰＵレンダリングを含む、グラフィック処理のための方法を示す流れ図である。such as when performing a Z pre-pass phase on a piece of geometry to generate information used to dynamically allocate screen space to a GPU for geometry rendering of an image frame, according to an embodiment of the present disclosure; A method for graphics processing, including multi-GPU rendering of geometry for an application, by performing timing analysis during the rendering or analysis phase to redistribute GPU responsiveness allocation during the rendering or analysis phase. FIG. 本開示の一実施形態による、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成するために、レンダリングのＺプレパスフェーズを実行する、ＧＰＵ割り当ての様々な分散を示す図である。A variety of GPU allocations that perform a Z pre-pass phase of rendering to generate information used for dynamic allocation of screen space to GPUs for geometry rendering of image frames, according to an embodiment of the present disclosure. It is a figure showing dispersion. 本開示の一実施形態による、スクリーン領域においてジオメトリのピースをレンダリングするために複数のＧＰＵを使用することを示す図である。FIG. 3 is a diagram illustrating the use of multiple GPUs to render pieces of geometry in a screen area, according to an embodiment of the present disclosure. 本開示の一実施形態による、ジオメトリのピースをそれらの対応するドローコールとは順不同でレンダリングすることを示す図である。FIG. 3 is a diagram illustrating rendering pieces of geometry out of order with their corresponding draw calls, according to an embodiment of the present disclosure. 本開示の様々な実施形態の態様を実行するために使用することができる例示的なデバイスのコンポーネントを示す。2 illustrates components of an example device that can be used to implement aspects of various embodiments of this disclosure.

以下の詳細な説明は、例示の目的で多くの特定の詳細を含むが、当業者であれば、以下の詳細に対する多くの変形及び変更が本開示の範囲内にあることを理解するであろう。したがって、以下で説明される本開示の態様は、この説明に続く特許請求の範囲への一般性を失うことなく、また限定を課すことなく示される。 Although the following detailed description includes many specific details for purposes of illustration, those skilled in the art will recognize that many variations and modifications to the following details will fall within the scope of this disclosure. . Accordingly, the aspects of the disclosure described below are presented without loss of generality to or imposition of limitations on the claims that follow this description.

一般的に言えば、個々のＧＰＵが達成できるパフォーマンスには限界があり、例えば、ＧＰＵをどれだけ大きくできるかの限界から導き出される。さらに複雑なシーンをレンダリングする、またはさらに複雑なアルゴリズム（例えば、マテリアル、ライティングなど）を使用するには、複数のＧＰＵを連携して使用して単一の画像フレームを生成及び／またはレンダリングすることが望ましい。例えば、画像フレーム内のオブジェクト及び／またはジオメトリのピース（piece:例えば、オブジェクトの一部、プリミティブ、ポリゴン、頂点など）のジオメトリ解析から決定された情報に基づいて、レンダリングのレスポンシビリティ（responsibility）が複数のＧＰＵ間で分割される。この情報は、インターリーブされる可能性のあるジオメトリと各スクリーン領域との間の関係を提供する。これにより、ＧＰＵはジオメトリをより効率的にレンダリングする、またはそれをすべてまとめてレンダリングするのを回避し得る。特に、本開示の様々な実施形態は、画像フレームのジオメトリの解析を提供し、画像フレームをレンダリングするレスポンシビリティをＧＰＵ間で動的かつ柔軟に割り当て、各ＧＰＵが最終的にその画像フレームに固有のものである（つまり、次の画像フレームでは、ＧＰＵのスクリーン領域への関連付けが異なる場合がある）スクリーン領域のセットのレスポンシビリティを持つことになるようにする。ジオメトリ解析と、画像フレームごとのＧＰＵへのレンダリングレスポンシビリティの動的な割り当てを通じて、本開示の実施形態は、ピクセル数（つまり、解像度）と複雑さの増加、及び／または幾何学的な複雑さの増加、及び／または、頂点及び／またはプリミティブあたりの処理量の増加をサポートする。具体的には、本開示の様々な実施形態は、画像フレームのジオメトリレンダリングのためにＧＰＵにスクリーン領域を動的に割り当てるレンダリング中にジオメトリ解析を実行することによって、アプリケーション用のジオメトリのマルチＧＰＵレンダリングを実行するように構成された方法及びシステムを説明し、ジオメトリ解析は、画像フレームのためにレンダリングされるジオメトリとスクリーン領域との間の関係を定義する情報に基づく。例えば、ジオメトリレンダリング前のＺプレパス中など、レンダリング中にジオメトリ解析の情報が生成される。具体的には、レンダリングの後続のフェーズ中にジオメトリレンダリングを実行するときに、ＧＰＵへのスクリーン領域のインテリジェントな割り当てを支援するために使用される情報をプレパスが生成するように、ハードウェアが構成される。本開示の他の実施形態は、画像フレームのレンダリングのそのフェーズのためにＧＰＵにスクリーン領域を動的に割り当てるために、レンダリングのフェーズの前にジオメトリ解析を実行することによって、アプリケーションのジオメトリのマルチＧＰＵレンダリングを実行するように構成された方法及びシステムを説明し、ジオメトリ解析は、画像フレームのためにレンダリングされるジオメトリとスクリーン領域との間の関係を定義する情報に基づく。例えば、情報は、シェーダ（例えば、ソフトウェア）を使用するなどして、レンダリングの前に実行されるプレパスで生成される。この情報は、ジオメトリレンダリングを実行するときに、スクリーン領域をＧＰＵにインテリジェントに割り当てるために使用される。本開示のさらに他の実施形態は、例えばドローコールによって処理または生成されたようなジオメトリのピースをジオメトリのより小さな部分に再分割し、ジオメトリのそれらのより小さな部分をレンダリングのために複数のＧＰＵに割り当て、ジオメトリのそれぞれのより小さな部分がＧＰＵに割り当てられるように構成される、方法及びシステムを説明する。利点として、例えばこれにより、複数のＧＰＵがより複雑なシーン及び／または画像を同じ時間量でレンダリングできるようになる。 Generally speaking, there are limits to the performance that an individual GPU can achieve, derived from, for example, limits on how large a GPU can be. To render more complex scenes or use more complex algorithms (e.g. materials, lighting, etc.), multiple GPUs can be used in conjunction to generate and/or render a single image frame. is desirable. For example, rendering responsiveness can be determined based on information determined from geometric analysis of objects and/or pieces of geometry (e.g., parts of objects, primitives, polygons, vertices, etc.) within an image frame. Split between multiple GPUs. This information provides the relationship between the potentially interleaved geometry and each screen area. This may allow the GPU to render the geometry more efficiently or avoid rendering it all together. In particular, various embodiments of the present disclosure provide for analysis of the geometry of image frames and dynamically and flexibly allocate responsibilities for rendering image frames among GPUs, with each GPU ultimately specific to that image frame. (i.e., the next image frame may have a different association of GPUs to screen regions). Through geometric analysis and dynamic allocation of rendering responsiveness to the GPU on a per-image frame basis, embodiments of the present disclosure can be used to increase pixel count (i.e., resolution) and complexity, and/or increase geometric complexity. and/or an increase in the amount of processing per vertex and/or primitive. Specifically, various embodiments of the present disclosure provide multi-GPU rendering of geometry for applications by performing geometry analysis during rendering that dynamically allocates screen space to GPUs for geometry rendering of image frames. A method and system configured to perform a geometry analysis is described, wherein the geometry analysis is based on information defining a relationship between geometry rendered for an image frame and a screen area. Geometry analysis information is generated during rendering, eg, during a Z pre-pass before geometry rendering. Specifically, the hardware is configured such that the prepass generates information that is used to assist in intelligently allocating screen space to the GPU when performing geometry rendering during subsequent phases of rendering. be done. Other embodiments of the present disclosure provide a method for multiplying the geometry of an application by performing geometry analysis prior to the rendering phase to dynamically allocate screen space to the GPU for that phase of image frame rendering. A method and system configured to perform GPU rendering is described, where the geometry analysis is based on information that defines a relationship between geometry rendered for an image frame and a screen area. For example, the information is generated in a pre-pass performed prior to rendering, such as using a shader (eg, software). This information is used to intelligently allocate screen space to GPUs when performing geometry rendering. Yet other embodiments of the present disclosure subdivide pieces of geometry, such as those processed or generated by draw calls, into smaller portions of geometry and use multiple GPUs for rendering those smaller portions of geometry. A method and system are described that are configured such that each smaller portion of the geometry is allocated to a GPU. As an advantage, for example, this allows multiple GPUs to render more complex scenes and/or images in the same amount of time.

様々な実施形態の上記の一般的な理解により、様々な図面を参照して実施形態の例の詳細をここに説明する。 With the above general understanding of various embodiments, details of example embodiments will now be described with reference to various drawings.

本明細書全体を通して、「アプリケーション」または「ゲーム」または「ビデオゲーム」または「ゲームアプリケーション」への言及は、入力コマンドの実行を通して指示される任意のタイプのインタラクティブアプリケーションを表すことを意味する。説明目的のみで、インタラクティブアプリケーションは、ゲーム、文書処理、ビデオ処理、ビデオゲーム処理などのためのアプリケーションを含む。さらに、これらの用語は、置き換え可能である。 Throughout this specification, references to "application" or "game" or "video game" or "gaming application" are meant to refer to any type of interactive application that is directed through the execution of input commands. For illustrative purposes only, interactive applications include applications for gaming, word processing, video processing, video game processing, and the like. Furthermore, these terms are interchangeable.

本明細書を通して、本開示の様々な実施形態は、４つのＧＰＵを有する例示的なアーキテクチャを使用するアプリケーションのためのマルチＧＰＵ処理またはジオメトリのレンダリングについて説明される。しかしながら、アプリケーションのジオメトリをレンダリングするときに、任意の数のＧＰＵ（例えば、２つ以上のＧＰＵ）が連携できることが理解される。 Throughout this specification, various embodiments of the present disclosure are described for multi-GPU processing or geometry rendering for applications using an exemplary architecture with four GPUs. However, it is understood that any number of GPUs (eg, two or more GPUs) can cooperate when rendering the geometry of an application.

図１は、本開示の一実施形態による、アプリケーション用の画像（例えば、画像フレーム）をレンダリングするときにマルチＧＰＵ処理を実行するためのシステムの図である。このシステムは、本開示の実施形態に従って、１つまたは複数のクラウドゲームサーバ間のネットワークを介してゲームを提供するように構成されており、より具体的には、複数のＧＰＵを連携させてアプリケーションの単一の画像をレンダリングするように構成されており、それは例えば、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てるために、レンダリング中またはレンダリング前に画像フレームのジオメトリのピースのジオメトリ解析を実行するとき、及び／または、例えばドローコールによって処理または生成されたようなジオメトリのピースをジオメトリのより小さな部分に再分割し、ジオメトリのそれらのより小さな部分をレンダリングのために複数のＧＰＵに割り当てるときであり、この場合は、ジオメトリのそれぞれのより小さな部分がＧＰＵに割り当てられる。クラウドゲームには、サーバでビデオゲームを実行して、ゲームでレンダリングされたビデオフレームを生成し、次いでそれをクライアントに送信して表示することが含まれる。具体的には、システム１００は、レンダリング前にインターリーブされたスクリーン領域に対して事前テストすることによって、アプリケーションのジオメトリの効率的なマルチＧＰＵレンダリングのために構成される。 FIG. 1 is a diagram of a system for performing multi-GPU processing when rendering images (eg, image frames) for an application, according to one embodiment of the present disclosure. The system is configured to provide games over a network between one or more cloud game servers in accordance with embodiments of the present disclosure, and more specifically, coordinates multiple GPUs to provide applications for It is configured to render a single image of an image frame during or before rendering, e.g. to dynamically allocate screen space to the GPU for image frame geometry rendering. When performing geometry analysis for , in which each smaller portion of the geometry is assigned to a GPU. Cloud gaming involves running a video game on a server to generate game-rendered video frames, which are then sent to a client for display. Specifically, system 100 is configured for efficient multi-GPU rendering of application geometry by pre-testing on interleaved screen regions before rendering.

図１は、クラウドゲームシステムの１つまたは複数のクラウドゲームサーバ間のジオメトリのマルチＧＰＵレンダリングの実施態様を示しているが、本開示の他の実施形態は、アプリケーションのジオメトリの効率的なマルチＧＰＵレンダリングを、複数のＧＰＵを有するハイエンドグラフィックカードを含む、パーソナルコンピュータやゲームコンソールなどの、スタンドアロンシステム内でレンダリングしながら領域テストを実行することによって提供する。 While FIG. 1 illustrates an implementation of multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system, other embodiments of the present disclosure provide efficient multi-GPU rendering of geometry of an application. Rendering is provided by performing area tests while rendering within a standalone system, such as a personal computer or game console, including a high-end graphics card with multiple GPUs.

ジオメトリのマルチＧＰＵレンダリングは、様々な実施形態において（例えば、クラウドゲーム環境またはスタンドアロンシステム内で）、物理ＧＰＵ、または仮想ＧＰＵ、または両方の組み合わせを使用して実行され得ることも理解される。例えば、仮想マシン（例えば、インスタンス）は、複数のＣＰＵ、メモリモジュール、ＧＰＵ、ネットワークインタフェース、通信コンポーネントなどのハードウェア層の１つまたは複数のコンポーネントを利用するホストハードウェア（例えば、データセンターに配置される）のハイパーバイザを使用して作成することができる。これらの物理リソースは、ＣＰＵのラック、ＧＰＵのラック、メモリのラックなどのラックに配置でき、インスタンスに使用される（インスタンスの仮想化されたコンポーネントを構築するときなど）コンポーネントの組み立てとアクセスのためのファブリックを容易にするラックスイッチのトップを使用して、ラック内の物理リソースにアクセスできる。通常、ハイパーバイザは、仮想リソースで構成された複数のインスタンスの複数のゲストオペレーティングシステムを提示できる。すなわち、オペレーティングシステムのそれぞれは、１つまたは複数のハードウェアリソース（例えば、対応するデータセンターに配置される）によってサポートされる仮想化リソースの対応するセットで構成され得る。例えば、各オペレーティングシステムは、仮想ＣＰＵ、複数の仮想ＧＰＵ、仮想メモリ、仮想化された通信コンポーネントなどでサポートされ得る。さらに、インスタンスの構成は、あるデータセンターから別のデータセンターに転送されてレイテンシを短縮することができる。ユーザまたはゲームに対して定義されたＧＰＵ利用は、ユーザのゲームセッションを保存するときに使用できる。ＧＰＵ利用は、ゲームセッション用のビデオフレームの高速レンダリングを最適化するために、本明細書で説明する任意の数の構成を含むことができる。一実施形態では、ゲームまたはユーザに対して定義されたＧＰＵ利用は、構成可能な設定としてデータセンター間で転送することができる。ＧＰＵ利用を転送する機能により、ユーザが異なる地理的位置からゲームをプレイするために接続する場合に、データセンターからデータセンターへのゲームプレイの効率的な移行が可能になる。 It is also understood that multi-GPU rendering of geometry may be performed in various embodiments (eg, within a cloud gaming environment or standalone system) using physical GPUs, or virtual GPUs, or a combination of both. For example, a virtual machine (e.g., an instance) is a host hardware (e.g., located in a data center) that utilizes one or more components of the hardware layer, such as multiple CPUs, memory modules, GPUs, network interfaces, communication components, etc. can be created using a hypervisor ( These physical resources can be placed in racks, such as racks of CPUs, racks of GPUs, and racks of memory, and are used for assembly and access of components used by the instance (such as when building virtualized components of the instance). The top of the rack switch facilitates access to physical resources within the rack. Typically, a hypervisor can present multiple guest operating systems in multiple instances configured with virtual resources. That is, each of the operating systems may be configured with a corresponding set of virtualized resources supported by one or more hardware resources (eg, located in a corresponding data center). For example, each operating system may be supported with a virtual CPU, multiple virtual GPUs, virtual memory, virtualized communication components, and so on. Additionally, instance configurations can be transferred from one data center to another to reduce latency. GPU utilization defined for a user or game can be used when saving a user's game session. GPU utilization may include any number of configurations described herein to optimize fast rendering of video frames for gaming sessions. In one embodiment, GPU utilization defined for a game or user can be transferred between data centers as a configurable setting. The ability to transfer GPU usage allows for efficient migration of gameplay from data center to data center when users connect to play games from different geographic locations.

システム１００は、クラウドゲームネットワーク１９０を介してゲームを提供し、本開示の一実施形態によれば、ゲームは、ゲームをプレイしている対応するユーザのクライアントデバイス１１０（例えば、シンクライアント）からリモートで実行されている。システム１００は、シングルプレイヤーモードまたはマルチプレイヤーモードのいずれかで、ネットワーク１５０を介してクラウドゲームネットワーク１９０を介して１つまたは複数のゲームをプレイする１人または複数のユーザにゲームのコントロールをもたらすことができる。いくつかの実施形態において、クラウドゲームネットワーク１９０は、ホストマシンのハイパーバイザ上で実行する複数の仮想マシン（ＶＭ）を含むことができ、１つまたは複数の仮想マシンは、ホストのハイパーバイザに利用可能であるハードウェアリソースを利用するゲームプロセッサモジュールを実行するように構成される。ネットワーク１５０は、１つまたは複数の通信技術を含み得る。いくつかの実施形態では、ネットワーク１５０は、高度な無線通信システムを有する第５世代（５Ｇ）ネットワーク技術を含み得る。 The system 100 provides games via a cloud gaming network 190 and, according to one embodiment of the present disclosure, the games are played remotely from a client device 110 (e.g., a thin client) of a corresponding user playing the game. is being executed. System 100 provides control of a game over network 150 to one or more users who play one or more games over cloud gaming network 190 in either single-player or multiplayer mode. Can be done. In some embodiments, cloud gaming network 190 may include multiple virtual machines (VMs) running on a host machine's hypervisor, one or more virtual machines being available to the host's hypervisor. The game processor module is configured to execute a game processor module utilizing available hardware resources. Network 150 may include one or more communication technologies. In some embodiments, network 150 may include fifth generation (5G) network technology with advanced wireless communication systems.

いくつかの実施形態では、通信は、無線技術を使用して促進され得る。そのような技術には、例えば、５Ｇ無線通信技術が含まれ得る。５Ｇは、セルラーネットワークテクノロジーの第５世代である。５Ｇネットワークはデジタルセルラーネットワークであり、プロバイダーがカバーするサービスエリアはセルと呼ばれる小さな地理的エリアに分割されている。音と画像を表すアナログ信号は、電話でデジタル化され、アナログデジタルコンバーターによって変換され、ビットのストリームとして送信される。セル内のすべての５Ｇワイヤレスデバイスは、他のセルで再利用される周波数のプールからトランシーバによって割り当てられた周波数チャネルを介して、セル内のローカルアンテナアレイ及び低電力自動トランシーバ（送信機及び受信機）と電波で通信する。ローカルアンテナは、高帯域幅光ファイバまたは無線バックホール接続によって、電話網及びインターネットに接続される。他のセルネットワークと同様に、あるセルから別のセルに移動するモバイルデバイスは、新しいセルに自動的に転送される。５Ｇネットワークは単なる一例のタイプの通信ネットワークであり、本開示の実施形態は、５Ｇに続く後の世代の有線または無線技術と同様に、前世代の無線または有線通信を利用することができることを理解されたい。 In some embodiments, communication may be facilitated using wireless technology. Such technology may include, for example, 5G wireless communication technology. 5G is the fifth generation of cellular network technology. 5G networks are digital cellular networks in which the service area covered by a provider is divided into small geographical areas called cells. Analog signals representing sound and images are digitized by the phone, converted by an analog-to-digital converter, and transmitted as a stream of bits. All 5G wireless devices within a cell connect to local antenna arrays within the cell and low-power automated transceivers (transmitters and receivers) via frequency channels allocated by the transceiver from a pool of frequencies that are reused in other cells. ) to communicate via radio waves. Local antennas are connected to the telephone network and the Internet by high bandwidth optical fiber or wireless backhaul connections. Similar to other cell networks, mobile devices that move from one cell to another are automatically transferred to the new cell. It is understood that a 5G network is just one example type of communication network, and embodiments of the present disclosure may utilize previous generations of wireless or wired communications, as well as later generations of wired or wireless technologies following 5G. I want to be

示されるように、クラウドゲームネットワーク１９０は、複数のビデオゲームへのアクセスを提供するゲームサーバ１６０を含む。ゲームサーバ１６０は、クラウド内で利用可能な任意の種類のサーバコンピューティングデバイスであってもよく、１つまたは複数のホスト上で実行される１つまたは複数の仮想マシンとして構成され得る。例えば、ゲームサーバ１６０は、ユーザのゲームのインスタンスをインスタンス化するゲームプロセッサをサポートする仮想マシンを管理し得る。よって、複数の仮想マシンに関連付けられたゲームサーバ１６０の複数のゲームプロセッサは、複数のユーザのゲームプレイに関連付けられた１つまたは複数のゲームの複数のインスタンスを実行するように構成される。そのようにして、バックエンドサーバサポートは、複数のゲームアプリケーションのゲームプレイのメディア（例えば、ビデオ、オーディオなど）のストリーミングを、対応する複数のユーザに提供する。つまり、ゲームサーバ１６０は、ネットワーク１５０を介して、データ（例えば、対応するゲームプレイのレンダリングされた画像及び／またはフレーム）を対応するクライアントデバイス１１０にストリーミング返信するように構成される。そのようにして、クライアントデバイス１１０によって受信されて転送されたコントローラの入力に応答して、計算の複雑なゲームアプリケーションが、バックエンドサーバで実行し続けることができる。各サーバは、画像及び／またはフレームをレンダリングし、次いでそれらを符号化（例えば、圧縮）して、対応するクライアントデバイスにストリーミングして表示することが可能である。 As shown, cloud gaming network 190 includes a game server 160 that provides access to multiple video games. Game server 160 may be any type of server computing device available in the cloud and may be configured as one or more virtual machines running on one or more hosts. For example, game server 160 may manage a virtual machine that supports a game processor that instantiates instances of a user's game. Thus, multiple game processors of game server 160 associated with multiple virtual machines are configured to execute multiple instances of one or more games associated with multiple users' gameplay. As such, the backend server support provides streaming of gameplay media (eg, video, audio, etc.) for multiple gaming applications to corresponding multiple users. That is, game server 160 is configured to stream data (eg, rendered images and/or frames of corresponding gameplay) back to corresponding client device 110 via network 150 . In that way, computationally complex gaming applications can continue to run on the backend server in response to controller inputs received and forwarded by the client device 110. Each server may render images and/or frames and then encode (eg, compress) and stream them to a corresponding client device for display.

例えば、複数のユーザは、ストリーミングメディアを受信するように構成された対応するクライアントデバイス１１０を使用して、通信ネットワーク１５０を介して、クラウドゲームネットワーク１９０にアクセスすることができる。一実施形態では、クライアントデバイス１１０は、計算機能（例えば、ゲームタイトル処理エンジン１１１を含む）を提供するように構成されたバックエンドサーバ（例えば、クラウドゲームネットワーク１９０）とのインターフェースを提供するシンクライアントとして構成され得る。別の実施形態では、クライアントデバイス１１０は、ビデオゲームの少なくともいくつかのローカル処理のためのゲームタイトル処理エンジン及びゲームロジックで構成され得、バックエンドサーバで実行されるビデオゲームによって生成されるストリーミングコンテンツを受信するために、またはバックエンドサーバサポートによって提供されるその他のコンテンツ用に、さらに利用され得る。ローカル処理の場合、ゲームタイトル処理エンジンは、ビデオゲームと、ビデオゲームに関連するサービスとを実行するための基本的なプロセッサベースの機能を含む。その場合、ゲームロジックは、ローカルクライアントデバイス１１０に格納することができ、ビデオゲームを実行するために使用される。 For example, multiple users may access cloud gaming network 190 via communication network 150 using corresponding client devices 110 configured to receive streaming media. In one embodiment, client device 110 is a thin client that provides an interface with a backend server (e.g., cloud gaming network 190) configured to provide computational functionality (e.g., includes game title processing engine 111). It can be configured as In another embodiment, the client device 110 may be configured with a game title processing engine and game logic for local processing of at least some of the video game and streaming content generated by the video game running on a backend server. or for other content provided by backend server support. For local processing, the game title processing engine includes the basic processor-based functionality for running the video game and services related to the video game. In that case, game logic can be stored on local client device 110 and used to run the video game.

クライアントデバイス１１０のそれぞれが、クラウドゲームネットワークから異なるゲームへのアクセスを要求している可能性がある。例えば、クラウドゲームネットワーク１９０は、ゲームサーバ１６０のＣＰＵリソース１６３及びＧＰＵリソース３６５を使用して実行されるように、ゲームタイトル処理エンジン１１１上に構築される１つまたは複数のゲームロジックを実行していてもよい。例えば、ゲームタイトル処理エンジン１１１と連携するゲームロジック１１５ａは、１つのクライアントのゲームサーバ１６０で実行され、ゲームタイトル処理エンジン１１１と連携するゲームロジック１１５ｂは、第２のクライアントのゲームサーバ１６０で実行され、そしてゲームタイトル処理エンジン１１１と連携するゲームロジック１１５ｎは、第Ｎのクライアントのゲームサーバ１６０で実行され得る。 Each of the client devices 110 may be requesting access to a different game from the cloud gaming network. For example, cloud gaming network 190 may be running one or more game logic built on game title processing engine 111 to be executed using CPU resources 163 and GPU resources 365 of game server 160. You can. For example, game logic 115a that works with the game title processing engine 111 is executed on the game server 160 of one client, and game logic 115b that works with the game title processing engine 111 is executed on the game server 160 of a second client. , and game logic 115n in conjunction with game title processing engine 111 may be executed on game server 160 of the Nth client.

特に、対応するユーザ（図示せず）のクライアントデバイス１１０は、インターネットなどの通信ネットワーク１５０経由でゲームへのアクセスを要求するために、及びゲームサーバ１６０により実行されるビデオゲームにより生成される表示画像（例えば、画像フレーム）をレンダリングするために構成され、その場合に符号化された画像が対応するユーザと関連する表示のためにクライアントデバイス１１０へ配信されている。例えば、ユーザは、ゲームサーバ１６０のゲームプロセッサ上で実行するビデオゲームのインスタンスとクライアントデバイス１１０を通してインタラクトすることができる。より具体的には、ビデオゲームのインスタンスは、ゲームタイトル処理エンジン１１１により実行される。ビデオゲームを実装する対応するゲームロジック（例えば、実行可能コード）１１５は、データストア（図示せず）を介して格納及びアクセス可能であり、ビデオゲームを実行するために使用される。ゲームタイトル処理エンジン１１１は、複数のゲームロジック（例えば、ゲームアプリケーション）を使用して複数のビデオゲームをサポートすることができ、それぞれがユーザによって選択可能である。 In particular, a client device 110 of a corresponding user (not shown) may request access to a game via a communication network 150, such as the Internet, and display images generated by a video game executed by a game server 160. (eg, image frames), where the encoded images are delivered to the client device 110 for display in association with a corresponding user. For example, a user may interact through client device 110 with an instance of a video game running on a game processor of game server 160. More specifically, instances of video games are executed by game title processing engine 111. Corresponding game logic (eg, executable code) 115 implementing the video game is stored and accessible via a data store (not shown) and is used to run the video game. Game title processing engine 111 may support multiple video games using multiple game logic (eg, game applications), each of which is selectable by a user.

例えば、クライアントデバイス１１０は、ゲームプレイを駆動するために使用される入力コマンドを介するなどして、対応するユーザのゲームプレイに関連付けられたゲームタイトル処理エンジン１１１とインタラクトするように構成される。特に、クライアントデバイス１１０は、ゲームコントローラ、タブレットコンピュータ、キーボードなどの様々な種類の入力デバイスからの入力、ビデオカメラ、マウス、タッチパッドなどにより取り込まれたジェスチャを、受信し得る。クライアントデバイス１１０は、メモリとプロセッサモジュールとを少なくとも有する任意の種類のコンピューティングデバイスであってもよく、ネットワーク１５０を介してゲームサーバ１６０に接続することができる。バックエンドゲームタイトル処理エンジン１１１は、レンダリングされた画像を生成するように構成され、レンダリングされた画像は、クライアントデバイス１１０に関連する対応するディスプレイに表示するためにネットワーク１５０を介して配信される。例えば、クラウドベースのサービスを介して、ゲームレンダリングされた画像は、ゲームサーバ１６０のゲーム実行エンジン１１１で実行される対応するゲーム（例えば、ゲームロジック）のインスタンスによって配信され得る。すなわち、クライアントデバイス１１０は、符号化された画像（例えば、ビデオゲームの実行を通じて生成されたゲームレンダリング画像から符号化された）を受信し、ディスプレイ１１上にレンダリングされる画像を表示するように構成される。一実施形態では、ディスプレイ１１は、ＨＭＤを含む（例えば、ＶＲコンテンツを表示する）。いくつかの実施形態では、レンダリングされた画像は、クラウドベースのサービスから直接、またはクライアントデバイス１１０（例えば、ＰｌａｙＳｔａｔｉｏｎ（登録商標）ＲｅｍｏｔｅＰｌａｙ）を介して、無線または有線でスマートフォンまたはタブレットにストリーミングすることができる。 For example, client device 110 is configured to interact with game title processing engine 111 associated with the corresponding user's gameplay, such as via input commands used to drive the gameplay. In particular, client device 110 may receive input from various types of input devices such as game controllers, tablet computers, keyboards, gestures captured by video cameras, mice, touch pads, and the like. Client device 110 may be any type of computing device having at least memory and a processor module, and may be connected to game server 160 via network 150. Backend game title processing engine 111 is configured to generate rendered images, which are distributed over network 150 for display on a corresponding display associated with client device 110. For example, via a cloud-based service, game rendered images may be delivered by an instance of a corresponding game (eg, game logic) running on game execution engine 111 of game server 160. That is, client device 110 is configured to receive encoded images (e.g., encoded from game rendered images generated through execution of a video game) and display the rendered images on display 11. be done. In one embodiment, display 11 includes an HMD (eg, displays VR content). In some embodiments, rendered images may be streamed wirelessly or wired to a smartphone or tablet directly from a cloud-based service or via a client device 110 (e.g., PlayStation® Remote Play). I can do it.

一実施形態では、ゲームサーバ１６０及び／またはゲームタイトル処理エンジン１１１は、ゲーム及びゲームアプリケーションに関連するサービスを実行するための基本的なプロセッサベースの機能を含む。例えば、ゲームサーバ１６０は、２Ｄまたは３Ｄレンダリング、物理シミュレーション、スクリプト作成、オーディオ、アニメーション、グラフィック処理、ライティング、シェーディング、ラスタ化、レイトレーシング、シャドウイング、カリング、変換、人工知能などを含むプロセッサベースの機能を実行するように構成された中央処理装置（ＣＰＵ）リソース１６３及びグラフィック処理ユニット（ＧＰＵ）リソース３６５を含む。さらに、ＣＰＵ及びＧＰＵグループは、メモリ管理、マルチスレッド管理、サービス品質（ＱｏＳ）、帯域幅テスト、ソーシャルネットワーキング、ソーシャルフレンドの管理、フレンドのソーシャルネットワークとの通信、通信チャネル、テキストメッセージ、インスタントメッセージング、チャットサポートなどを部分的に含む、ゲームアプリケーション用のサービスを実装する場合がある。一実施形態では、１つまたは複数のアプリケーションが特定のＧＰＵリソースを共有する。一実施形態では、複数のＧＰＵデバイスを組み合わせて、対応するＣＰＵ上で実行されている単一のアプリケーション用のグラフィック処理を実行することができる。 In one embodiment, game server 160 and/or game title processing engine 111 includes the basic processor-based functionality for running games and services related to game applications. For example, game server 160 may provide processor-based services including 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, raytracing, shadowing, culling, transformations, artificial intelligence, etc. It includes central processing unit (CPU) resources 163 and graphics processing unit (GPU) resources 365 configured to perform functions. In addition, the CPU and GPU groups support memory management, multithread management, quality of service (QoS), bandwidth testing, social networking, managing social friends, communicating with social networks of friends, communication channels, text messages, instant messaging, Services may be implemented for gaming applications, including in part chat support. In one embodiment, one or more applications share certain GPU resources. In one embodiment, multiple GPU devices may be combined to perform graphics processing for a single application running on a corresponding CPU.

一実施形態では、クラウドゲームネットワーク１９０は、分散型ゲームサーバシステム及び／またはアーキテクチャである。具体的には、ゲームロジックを実行する分散型ゲームエンジンが、対応するゲームの対応するインスタンスとして構成されている。一般に、分散型ゲームエンジンは、ゲームエンジンの各機能を取り込み、それらの機能を分散させて多数の処理エンティティによって実行する。個々の機能は、さらに１つまたは複数の処理エンティティにわたって分散させることができる。処理エンティティは、物理ハードウェアを含んで、及び／または仮想コンポーネントまたは仮想マシンとして、及び／または仮想コンテナとしてなど、様々な構成で構成することができ、コンテナは、仮想化されたオペレーティングシステム上で動作するゲームアプリケーションのインスタンスを仮想化するものであるため、仮想マシンとは異なる。処理エンティティは、クラウドゲームネットワーク１９０の１つまたは複数のサーバ（計算ノード）上のサーバ及びその基礎となるハードウェアを利用し、及び／またはそれらに依拠してもよく、サーバは１つまたは複数のラック上に配置され得る。種々の処理エンティティに対するそれらの機能の実行の協調、割り当て、及び管理は、分散同期層によって行われる。そのようにして、それらの機能の実行が分散同期層によって制御されて、プレイヤーによるコントローラ入力に応答して、ゲームアプリケーション用のメディア（例えば、ビデオフレーム、オーディオなど）を生成することが可能になる。分散同期層は、重要なゲームエンジンコンポーネント／機能が、より効率的な処理のために分散されて再構築されるように、分散処理エンティティ全体で（例えば、負荷バランシングを介して）それらの機能を効率的に実行することが可能である。 In one embodiment, cloud gaming network 190 is a distributed gaming server system and/or architecture. Specifically, a distributed game engine that executes game logic is configured as a corresponding instance of a corresponding game. Generally, a distributed game engine takes the functions of a game engine and distributes those functions to be performed by multiple processing entities. Individual functionality may be further distributed across one or more processing entities. Processing entities may be configured in various configurations, including physical hardware and/or as virtual components or machines, and/or as virtual containers, where a container is configured on a virtualized operating system. It is different from a virtual machine because it virtualizes an instance of a running game application. The processing entity may utilize and/or rely on servers and their underlying hardware on one or more servers (compute nodes) of cloud gaming network 190, where the servers may can be placed on a rack. Coordination, assignment, and management of the performance of their functions to the various processing entities is done by a distributed synchronization layer. As such, the execution of those functions is controlled by the distributed synchronization layer to enable generation of media (e.g., video frames, audio, etc.) for the gaming application in response to controller input by the player. . A distributed synchronization layer distributes critical game engine components/functions across distributed processing entities (e.g., via load balancing) so that they are distributed and rebuilt for more efficient processing. It is possible to execute it efficiently.

図２は、本開示の一実施形態による、複数のＧＰＵが連携して対応するアプリケーションの単一の画像をレンダリングする、例示的なマルチＧＰＵアーキテクチャ２００の図である。本開示の様々な実施形態によれば、マルチＧＰＵアーキテクチャ２００は、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てるために、レンダリング中またはレンダリング前に、及び／または、例えばドローコールによって処理または生成されたようなジオメトリのピースをジオメトリのより小さな部分に再分割し、ジオメトリのそれらのより小さな部分をレンダリングのために複数のＧＰＵに割り当てるときに、画像フレームのジオメトリのピースのジオメトリ解析を実行するように構成されており、ジオメトリのそれぞれのより小さな部分がＧＰＵに割り当てられる。明示的に説明または図示されていないが、複数のＧＰＵが連携して単一の画像をレンダリングする本開示の様々な実施形態において、多くのアーキテクチャが可能であることが理解される。例えば、レンダリング中に領域テストを実行することによるアプリケーション用のジオメトリのマルチＧＰＵレンダリングは、クラウドゲームシステムの１つまたは複数のクラウドゲームサーバ間で実装することも、パーソナルコンピュータまたは、複数のＧＰＵを有するハイエンドグラフィックカードを含むゲームコンソールなどのスタンドアロンシステム内で実装することもできる。 FIG. 2 is a diagram of an example multi-GPU architecture 200 in which multiple GPUs work together to render a single image of a corresponding application, according to one embodiment of the present disclosure. According to various embodiments of the present disclosure, the multi-GPU architecture 200 dynamically allocates screen area to GPUs for geometry rendering of image frames during or before rendering, and/or for example, when drawing of a piece of geometry in an image frame when subdividing a piece of geometry as processed or produced by a call into smaller pieces of geometry and assigning those smaller pieces of geometry to multiple GPUs for rendering. It is configured to perform geometry analysis, and each smaller portion of the geometry is assigned to a GPU. Although not explicitly described or illustrated, it is understood that many architectures are possible in various embodiments of the present disclosure in which multiple GPUs cooperate to render a single image. Multi-GPU rendering of geometry for applications, e.g. by performing area tests during rendering, can also be implemented between one or more cloud gaming servers in a cloud gaming system, or a personal computer or computer with multiple GPUs. It can also be implemented within standalone systems such as game consoles that include high-end graphics cards.

マルチＧＰＵアーキテクチャ２００は、アプリケーション用の単一の画像（「画像フレーム」とも呼ばれる）、及び／またはアプリケーション用の一連の画像の各画像のマルチＧＰＵレンダリングのために構成されたＣＰＵ１６３及び複数のＧＰＵを含む。具体的には、ＣＰＵ１６３及びＧＰＵリソース３６５は、前述の通り、２Ｄまたは３Ｄレンダリング、物理シミュレーション、スクリプト作成、オーディオ、アニメーション、グラフィック処理、ライティング、シェーディング、ラスタ化、レイトレーシング、シャドウイング、カリング、変換、人工知能などを含むプロセッサベースの機能を実行するように構成される。 Multi-GPU architecture 200 includes a CPU 163 and multiple GPUs configured for multi-GPU rendering of a single image (also referred to as an "image frame") for an application, and/or each image of a series of images for an application. include. Specifically, as described above, the CPU 163 and GPU resources 365 are used for 2D or 3D rendering, physics simulation, script creation, audio, animation, graphic processing, lighting, shading, rasterization, ray tracing, shadowing, culling, and conversion. , configured to perform processor-based functions, including artificial intelligence, etc.

例えば、マルチＧＰＵアーキテクチャ２００のＧＰＵリソース３６５には４つのＧＰＵが示されているが、アプリケーション用の画像をレンダリングする際には任意の数のＧＰＵを利用することができる。各ＧＰＵは、高速バス２２０を介して、ランダムアクセスメモリ（ＲＡＭ）などの対応する専用メモリに接続される。具体的には、ＧＰＵ－Ａはバス２２０を介してメモリ２１０Ａ（例えばＲＡＭ）に接続され、ＧＰＵ－Ｂはバス２２０を介してメモリ２１０Ｂ（例えばＲＡＭ）に接続され、ＧＰＵ－Ｃはバス２２０を介してメモリ２１０Ｃ（例えばＲＡＭ）に接続され、ＧＰＵ－Ｄはバス２２０を介してメモリ２１０Ｄ（例えば、ＲＡＭ）に接続される。 For example, although four GPUs are shown in GPU resources 365 of multi-GPU architecture 200, any number of GPUs may be utilized when rendering images for an application. Each GPU is connected to a corresponding dedicated memory, such as random access memory (RAM), via a high speed bus 220. Specifically, GPU-A is connected to memory 210A (e.g., RAM) via bus 220, GPU-B is connected to memory 210B (e.g., RAM) via bus 220, and GPU-C is connected to memory 210A (e.g., RAM) via bus 220. GPU-D is connected to memory 210D (eg, RAM) via bus 220.

さらに、各ＧＰＵは、バス２４０を介して互いに接続され、バス２４０は、アーキテクチャに応じて、対応するＧＰＵとその対応するメモリとの間の通信に使用されるバス２２０と速度がほぼ等しいかそれより遅いものであり得る。例えば、ＧＰＵ－Ａは、バス２４０を介してＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄのそれぞれに接続される。また、ＧＰＵ－Ｂは、バス２４０を介してＧＰＵ－Ａ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄのそれぞれに接続される。加えて、ＧＰＵ－Ｃは、バス２４０を介してＧＰＵ－Ａ、ＧＰＵ－Ｂ、及びＧＰＵ－Ｄのそれぞれに接続される。さらに、ＧＰＵ－Ｄは、バス２４０を介してＧＰＵ－Ａ、ＧＰＵ－Ｂ、及びＧＰＵ－Ｃのそれぞれに接続される。 Additionally, each GPU is connected to each other via a bus 240, which, depending on the architecture, is approximately equal in speed to or similar to bus 220 used for communication between the corresponding GPU and its corresponding memory. It can be slower. For example, GPU-A is connected to each of GPU-B, GPU-C, and GPU-D via bus 240. Further, GPU-B is connected to each of GPU-A, GPU-C, and GPU-D via a bus 240. Additionally, GPU-C is connected to each of GPU-A, GPU-B, and GPU-D via bus 240. Additionally, GPU-D is connected to each of GPU-A, GPU-B, and GPU-C via bus 240.

ＣＰＵ１６３は、低速バス２３０を介して各ＧＰＵに接続する（例えば、バス２３０は、対応するＧＰＵとその対応するメモリとの間の通信に使用されるバス２２０より遅い）。具体的には、ＣＰＵ１６３は、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄのそれぞれに接続される。 CPU 163 connects to each GPU via a slow bus 230 (eg, bus 230 is slower than bus 220 used for communication between a corresponding GPU and its corresponding memory). Specifically, the CPU 163 is connected to each of GPU-A, GPU-B, GPU-C, and GPU-D.

いくつかの実施形態では、４つのＧＰＵは個別のＧＰＵであり、それぞれが独自のシリコンダイ上にある。他の実施形態では、４つのＧＰＵは、高速相互接続及びダイ上の他のユニットを利用するために、ダイを共有することができる。さらに他の実施形態では、単一のより強力なＧＰＵとして、または４つのより強力でない「仮想」ＧＰＵ（ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ及びＧＰＵ－Ｄ）のどちらかとして使用するように構成できる、１つの物理ＧＰＵ２５０が存在する。すなわち、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、ＧＰＵ－Ｄそれぞれがグラフィックパイプラインを動作させるのに十分な機能があり（図４に示すように）、チップ全体としてグラフィックパイプラインを動作させることができ（図４に示すように）、構成は２つの構成間で（例えば、レンダリングパス間で）柔軟に切り替えることができる。 In some embodiments, the four GPUs are separate GPUs, each on its own silicon die. In other embodiments, four GPUs may share a die to take advantage of high speed interconnects and other units on the die. Still other embodiments may use it as either a single more powerful GPU or as four less powerful "virtual" GPUs (GPU-A, GPU-B, GPU-C and GPU-D). There is one physical GPU 250 that can be configured as follows. In other words, GPU-A, GPU-B, GPU-C, and GPU-D each have sufficient functionality to operate the graphics pipeline (as shown in Figure 4), and the entire chip operates the graphics pipeline. (as shown in Figure 4), and the configuration can be flexibly switched between the two configurations (eg, between rendering passes).

図３は、本開示の様々な実施形態による、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てるために、レンダリング中またはレンダリング前に、及び／または、例えばドローコールによって処理または生成されたようなジオメトリのピースをジオメトリのより小さな部分に再分割し、ジオメトリのそれらのより小さな部分をレンダリングのために複数のＧＰＵに割り当てるときに、画像フレームのジオメトリのピースのジオメトリ解析を実行することによって、アプリケーションによって生成された画像フレームのジオメトリのマルチＧＰＵレンダリングのために構成されており、ジオメトリのそれぞれのより小さな部分がＧＰＵに割り当てられる、グラフィック処理ユニットリソース３６５の図である。例えば、ゲームサーバ１６０は、図１のクラウドゲームネットワーク１９０にＧＰＵリソース３６５を含めるように構成され得る。図示のように、ＧＰＵリソース３６５には、ＧＰＵ３６５ａ、ＧＰＵ３６５ｂ…ＧＰＵ３６５ｎなどの複数のＧＰＵが含まれる。前述のように、様々なアーキテクチャは、クラウドゲームシステムの１つまたは複数のクラウドゲームサーバ間でジオメトリのマルチＧＰＵレンダリングを実装する、または、複数のＧＰＵを有するハイエンドグラフィックカードを含むパーソナルコンピュータやゲームコンソールなどのスタンドアロンシステム内でのジオメトリのマルチＧＰＵレンダリングを実装するなど、レンダリング中に領域テストを介してアプリケーションのジオメトリのマルチＧＰＵレンダリングを実行することにより、単一の画像をレンダリングするために連携する、複数のＧＰＵを含むことができる。 FIG. 3 illustrates processing or processing during or before rendering and/or, e.g., by draw calls, to dynamically allocate screen space to the GPU for geometry rendering of image frames, according to various embodiments of the present disclosure. Perform geometry analysis on a piece of geometry in an image frame when subdividing such a generated piece of geometry into smaller pieces of geometry and assigning those smaller pieces of geometry to multiple GPUs for rendering 3 is a diagram of a graphics processing unit resource 365 configured for multi-GPU rendering of the geometry of an image frame generated by an application, with each smaller portion of the geometry allocated to a GPU. For example, game server 160 may be configured to include GPU resources 365 in cloud gaming network 190 of FIG. As illustrated, the GPU resource 365 includes a plurality of GPUs such as GPU 365a, GPU 365b, . . . GPU 365n. As mentioned above, various architectures implement multi-GPU rendering of geometry between one or more cloud gaming servers in a cloud gaming system, or in a personal computer or gaming console that includes a high-end graphics card with multiple GPUs. Implementing multi-GPU rendering of geometry within a standalone system, etc., working together to render a single image by performing multi-GPU rendering of the application's geometry through region testing during rendering, etc. It can include multiple GPUs.

具体的には、一実施形態では、ゲームサーバ１６０は、複数のＧＰＵが連携して単一の画像をレンダリングする、及び／またはアプリケーションの実行時に一連の画像の１つまたは複数の画像のそれぞれをレンダリングするように、アプリケーションの単一の画像をレンダリングするときに、マルチＧＰＵ処理を実行するように構成される。例えば、一実施形態では、ゲームサーバ１６０は、アプリケーションの一連の画像における１つまたは複数の画像のそれぞれのマルチＧＰＵレンダリングを実行するように構成されたＣＰＵ及びＧＰＵグループを含むことができ、１つのＣＰＵ及びＧＰＵグループはグラフィックを実装する、及び／またはアプリケーション用のパイプラインをレンダリングすることができる。ＣＰＵ及びＧＰＵグループは、１つまたは複数の処理デバイスとして構成できる。前述のとおり、ＧＰＵ及びＧＰＵグループは、ＣＰＵ１６３及びＧＰＵリソース３６５を含むことができ、これらは、２Ｄまたは３Ｄレンダリング、物理シミュレーション、スクリプト作成、オーディオ、アニメーション、グラフィック処理、ライティング、シェーディング、ラスタ化、レイトレーシング、シャドウイング、カリング、変換、人工知能などを含むプロセッサベースの機能を実行するように構成される。 Specifically, in one embodiment, the game server 160 allows multiple GPUs to work together to render a single image and/or to render each of one or more images in a series of images during application execution. The application is configured to perform multi-GPU processing when rendering a single image of the application. For example, in one embodiment, game server 160 may include a CPU and GPU group configured to perform multi-GPU rendering of each of one or more images in a series of images of an application; The CPU and GPU groups may implement graphics and/or render pipelines for applications. CPU and GPU groups can be configured as one or more processing devices. As mentioned above, GPUs and GPU groups can include CPUs 163 and GPU resources 365, which can be used for 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, late Configured to perform processor-based functions including lacing, shadowing, culling, transformation, artificial intelligence, etc.

ＧＰＵリソース３６５は、オブジェクトのレンダリング（例えば、オブジェクトのピクセルの色または法線ベクトル値を複数レンダーターゲット－ＭＲＴに書き込む）及び同期計算カーネルの実行（例えば、結果のＭＲＴでのフルスクリーン効果）のレスポンシビリティを持ち、及び／またはそのために構成され、実行する同期計算、及びレンダリングするオブジェクトは、ＧＰＵが実行する複数のレンダリングコマンドバッファ３２５に含まれるコマンドによって指定される。具体的には、ＧＰＵリソース３６５は、オブジェクトをレンダリングし、レンダリングコマンドバッファ３２５からコマンドを実行する際に（例えば、同期計算カーネルの実行中に）同期計算を実行するように構成され、コマンドは、及び／または操作は、順番に実行されるように、他の操作に依存する場合がある。 GPU resources 365 are responsible for rendering objects (e.g., writing the object's pixel colors or normal vector values to a multiple render target - MRT) and executing synchronous computation kernels (e.g., full-screen effects on the resulting MRT). The synchronized computations to be performed and the objects to be rendered are specified by commands contained in a plurality of rendering command buffers 325 that are executed by the GPU. Specifically, GPU resources 365 are configured to perform synchronous computations upon rendering objects and executing commands from rendering command buffer 325 (e.g., during execution of a synchronous computation kernel), where the commands and/or operations may depend on other operations to be performed in sequence.

例えば、ＧＰＵリソース３６５は同期計算、及び／または１つまたは複数のレンダリングコマンドバッファ３２５（例えば、レンダリングコマンドバッファ３２５ａ、レンダリングバッファ３２５ｂ…レンダリングコマンドバッファ３２５ｎ）を使用するオブジェクトのレンダリングを実行するように構成されている。一実施形態では、ＧＰＵリソース３６５内の各ＧＰＵは、独自のコマンドバッファを有することができる。あるいは、オブジェクトの実質的に同じセットが各ＧＰＵによってレンダリングされているとき（例えば、領域のサイズが小さいため）、ＧＰＵリソース３６５内のＧＰＵは、同じコマンドバッファまたはコマンドバッファの同じセットを使用することができる。さらに、ＧＰＵリソース３６５内の各ＧＰＵは、コマンドが１つのＧＰＵによって実行されるが別のＧＰＵによって実行されない機能をサポートすることができる。例えば、レンダリングコマンドバッファ内の描画コマンドまたは述語のフラグにより、単一のＧＰＵが対応するコマンドバッファ内の１つまたは複数のコマンドを実行できるようになるが、他のＧＰＵはコマンドを無視する。例えば、レンダリングコマンドバッファ３２５ａはフラグ３３０ａをサポートすることができ、レンダリングコマンドバッファ３２５ｂはフラグ３３０ｂをサポートし、レンダリングコマンドバッファ３２５ｎはフラグ３３０ｎをサポートすることができる。 For example, GPU resources 365 are configured to perform synchronous computations and/or rendering of objects using one or more rendering command buffers 325 (e.g., rendering command buffer 325a, rendering buffer 325b...rendering command buffer 325n). has been done. In one embodiment, each GPU within GPU resource 365 may have its own command buffer. Alternatively, when substantially the same set of objects is being rendered by each GPU (e.g., because the size of the region is small), the GPUs in GPU resource 365 may use the same command buffer or the same set of command buffers. Can be done. Additionally, each GPU in GPU resource 365 may support functionality whose commands are executed by one GPU but not another GPU. For example, a flag in a drawing command or predicate in a rendering command buffer allows a single GPU to execute one or more commands in the corresponding command buffer, while other GPUs ignore the commands. For example, rendering command buffer 325a may support flag 330a, rendering command buffer 325b may support flag 330b, and rendering command buffer 325n may support flag 330n.

同期計算のパフォーマンス（例えば、同期計算カーネルの実行）とオブジェクトのレンダリングは、レンダリング全体の一部である。例えば、ビデオゲームが６０Ｈｚ（例：６０フレーム／秒）で実行されている場合、画像フレームのすべてのオブジェクトレンダリングと同期計算カーネルの実行は通常、約１６．６７ｍｓ（例えば、６０Ｈｚで１フレーム）以内に完了する必要がある。前述のように、オブジェクトをレンダリングするとき及び／または同期計算カーネルを実行するときの操作は、操作が他の操作に依存してもよいように順序付けられる（例えば、レンダリングコマンドバッファ内のコマンドは、そのレンダリングコマンドバッファ内の他のコマンドが実行される前に実行を完了する必要がある場合がある）。 The performance of synchronous computations (eg, execution of synchronous computation kernels) and rendering of objects are part of the overall rendering. For example, if a video game is running at 60Hz (e.g., 60 frames/second), all object rendering and synchronized computation kernels for an image frame typically execute within about 16.67ms (e.g., one frame at 60Hz). must be completed. As mentioned above, operations when rendering objects and/or executing synchronous computation kernels are ordered such that operations may depend on other operations (e.g., commands in the rendering command buffer are (It may be necessary to complete execution before other commands in that rendering command buffer are executed.)

具体的には、レンダリングコマンドバッファ３２５のそれぞれは、対応するＧＰＵ構成に影響を与えるコマンド（例えば、レンダーターゲットの位置及びフォーマットを指定するコマンド）、ならびにオブジェクトをレンダリングする、及び／または同期計算カーネルを実行するためのコマンドを含む、様々なタイプのコマンドを含む。説明のために、同期計算カーネルを実行するときに実行される同期計算には、オブジェクトが対応する１つ以上の複数レンダーターゲット（ＭＲＴ）にすべてレンダリングされたときにフルスクリーン効果を実行することが含まれる場合がある。 Specifically, each of the rendering command buffers 325 contains commands that affect the corresponding GPU configuration (e.g., commands that specify the position and format of a render target), as well as commands that render objects and/or synchronize computation kernels. Contains various types of commands, including commands to execute. To illustrate, the synchronous calculations performed when running a synchronous calculation kernel include the ability to perform full-screen effects when objects are all rendered to their corresponding one or more multiple render targets (MRTs). may be included.

さらに、ＧＰＵリソース３６５が画像フレームのオブジェクトをレンダリングするとき、及び／または画像フレームを生成するときに同期計算カーネルを実行するとき、ＧＰＵリソース３６５は、各ＧＰＵ３６５ａ、３６５ｂ…３６５ｎのレジスタを介して構成される。例えば、ＧＰＵ３６５ａは、そのレジスタ３４０（例えば、レジスタ３４０ａ、レジスタ３４０ｂ…レジスタ３４０ｎ）を介して、そのレンダリングまたは計算カーネル実行を特定の方法で実行するように構成される。すなわち、レジスタ３４０に格納された値は、オブジェクトをレンダリングするため、及び／または画像フレームの同期計算カーネルを実行するために使用されるレンダリングコマンドバッファ３２５内のコマンドを実行するときのＧＰＵ３６５ａ３６５のハードウェアコンテキスト（例えば、ＧＰＵ構成またはＧＰＵ状態）を定義する。ＧＰＵリソース３６５内のＧＰＵのそれぞれは、ＧＰＵ３６５ｂがそのレジスタ３５０（例えば、レジスタ３５０ａ、レジスタ３５０ｂ…レジスタ３５０ｎ）を介して構成され、特定の方法でそのレンダリングを実行するか、またはカーネル実行を計算するように、同様に構成され得る。そしてＧＰＵ３６５ｎは、そのレジスタ３７０（例えば、レジスタ３７０ａ、レジスタ３７０ｂ…レジスタ３７０ｎ）を介して構成され、特定の方法でそのレンダリングまたは計算カーネル実行を実行する。 Further, when GPU resource 365 renders an object for an image frame and/or executes a synchronous computation kernel when generating an image frame, GPU resource 365 configures the be done. For example, GPU 365a is configured to perform its rendering or computational kernel execution in a particular manner via its registers 340 (eg, register 340a, register 340b...register 340n). That is, the value stored in register 340 is used by GPU 365a 365 hardware when executing commands in rendering command buffer 325 that are used to render objects and/or execute image frame synchronization calculation kernels. Define the context (eg, GPU configuration or GPU state). Each of the GPUs in GPU resources 365 allows GPU 365b to be configured via its registers 350 (e.g., register 350a, register 350b...register 350n) to perform its rendering in a particular manner or compute kernel execution. may be similarly configured. GPU 365n is then configured via its registers 370 (eg, register 370a, register 370b...register 370n) to perform its rendering or computational kernel execution in a particular manner.

ＧＰＵ構成の例には、レンダーターゲット（ＭＲＴなど）の位置とフォーマットが含まれる。また、ＧＰＵ構成の他の例には、操作手順が含まれる。例えば、オブジェクトをレンダリングするとき、オブジェクトの各ピクセルのＺ値を様々な方法でＺバッファと比較できる。例えば、オブジェクトのＺ値がＺバッファの値と一致する場合にのみ、オブジェクトのピクセルが書き込まれる。別の方法として、オブジェクトのＺ値がＺバッファの値と同じかそれより小さい場合にのみ、オブジェクトのピクセルを書き込むこともできる。実行されるテストのタイプは、ＧＰＵ構成内で定義される。 Examples of GPU configurations include the location and format of render targets (such as MRT). Further, other examples of GPU configurations include operating procedures. For example, when rendering an object, the Z value of each pixel of the object can be compared to the Z buffer in various ways. For example, pixels of an object are written only if the object's Z value matches the value in the Z buffer. Alternatively, pixels of an object may be written only if the object's Z value is less than or equal to the value in the Z buffer. The type of test performed is defined within the GPU configuration.

図４は、本開示の一実施形態による、複数のＧＰＵが連携して単一の画像をレンダリングするように、マルチＧＰＵ処理用に構成されたグラフィックパイプライン４００を実装する、レンダリングアーキテクチャの簡略図である。グラフィックパイプライン４００は、３Ｄ（三次元）ポリゴンレンダリング処理を使用して画像をレンダリングする一般的処理の例示である。レンダリングされた画像に対するグラフィックパイプライン４００は、ピクセルの各々に対する対応する色情報をディスプレイに出力し、色情報は、テクスチャ及びシェーディング（例えば、色、シャドーイングなど）を表すことができる。グラフィックパイプライン４００は、図１及び図３のクライアントデバイス１１０、ゲームサーバ１６０、ゲームタイトル処理エンジン１１１、及び／またはＧＰＵリソース３６５内に実装可能であり得る。つまり、様々なアーキテクチャは、クラウドゲームシステムの１つまたは複数のクラウドゲームサーバ間でジオメトリのマルチＧＰＵレンダリングを実装する、または、複数のＧＰＵを有するハイエンドグラフィックカードを含むパーソナルコンピュータやゲームコンソールなどのスタンドアロンシステム内でのジオメトリのマルチＧＰＵレンダリングを実装するなど、レンダリング中に領域テストを介してアプリケーション用のジオメトリのマルチＧＰＵレンダリングを実行することにより、単一の画像をレンダリングするために連携する、複数のＧＰＵを含むことができる。 FIG. 4 is a simplified diagram of a rendering architecture implementing a graphics pipeline 400 configured for multi-GPU processing, such that multiple GPUs work together to render a single image, according to one embodiment of the present disclosure. It is. Graphics pipeline 400 is illustrative of a general process for rendering images using a 3D (three-dimensional) polygon rendering process. The graphics pipeline 400 for rendered images outputs corresponding color information for each pixel to the display, where the color information can represent texture and shading (eg, color, shadowing, etc.). Graphics pipeline 400 may be implementable within client device 110, game server 160, game title processing engine 111, and/or GPU resources 365 of FIGS. 1 and 3. In other words, various architectures implement multi-GPU rendering of geometry between one or more cloud game servers in a cloud gaming system, or in a standalone system such as a personal computer or game console containing a high-end graphics card with multiple GPUs. Implementing multi-GPU rendering of geometry within the system, including multiple It can include a GPU.

示されているように、グラフィックパイプラインは入力ジオメトリ４０５を受信する。例えば、ジオメトリ処理ステージ４１０は、入力ジオメトリ４０５を受信する。例えば、入力ジオメトリ４０５は、３Ｄゲーミング世界内の頂点、及び頂点の各々に対応する情報を含んでもよい。ゲーミング世界内の所与のオブジェクトは、頂点によって定義されるポリゴン（例えば、三角形）を使用して表すことができ、対応するポリゴンの表面は、次に、グラフィックパイプライン４００を介して処理されて、最終効果（例えば、色、テクスチャ、等）を達成する。頂点属性には、法線（例えば、その位置のジオメトリに対してどの方向が直角であるか）、色（例えば、ＲＧＢー赤、緑、青のトリプルなど）、及びテクスチャ座標／マッピング情報が含まれ得る。 As shown, the graphics pipeline receives input geometry 405. For example, geometry processing stage 410 receives input geometry 405. For example, input geometry 405 may include vertices within the 3D gaming world and information corresponding to each of the vertices. A given object within the gaming world can be represented using a polygon (e.g., a triangle) defined by vertices, and the corresponding polygon surface is then processed via the graphics pipeline 400. , to achieve the final effect (e.g. color, texture, etc.). Vertex attributes include normal (e.g., which direction is perpendicular to the geometry at that location), color (e.g., RGB - triple red, green, blue, etc.), and texture coordinates/mapping information. It can be done.

ジオメトリ処理ステージ４１０は、頂点処理（例えば、頂点シェーダを介して）及びプリミティブ処理の両方のレスポンシビリティを持つ（そしてそれらを行うことができる）。具体的には、ジオメトリ処理ステージ４１０は、プリミティブを定義する頂点のセットを出力し、それらをグラフィックパイプライン４００の次のステージに配信するだけでなく、それらの頂点の位置（正確には同次座標）及び様々な他のパラメータを出力することができる。位置は、後のシェーダステージによるアクセスのために位置キャッシュ４５０に配置される。他のパラメータは、これも後のシェーダステージによるアクセスのためにパラメータキャッシュ４６０に配置される。 The geometry processing stage 410 is responsible for (and can perform) both vertex processing (eg, via a vertex shader) and primitive processing. Specifically, geometry processing stage 410 not only outputs a set of vertices that define a primitive and delivers them to the next stage of graphics pipeline 400, but also determines the positions of those vertices (more precisely, homogeneous coordinates) and various other parameters. The location is placed in a location cache 450 for access by later shader stages. Other parameters are placed in parameter cache 460, also for access by later shader stages.

プリミティブ及び／またはポリゴンのライティング及びシャドーイング計算の実行など、様々な操作がジオメトリ処理ステージ４１０によって実行され得る。一実施形態では、ジオメトリステージはプリミティブを処理できるため、背面カリング及び／またはクリッピング（例えば、視錐台に対するテスト）を実行でき、それにより、下流ステージ（例えば、ラスタ化ステージ４２０など）の負荷を軽減する。別の実施形態では、ジオメトリステージはプリミティブを生成することができる（例えば、従来のジオメトリシェーダと同等の機能を有する）。 Various operations may be performed by geometry processing stage 410, such as performing lighting and shadowing calculations for primitives and/or polygons. In one embodiment, the geometry stage can process primitives and thus perform backface culling and/or clipping (e.g., testing against view frustums), thereby freeing up downstream stages (e.g., rasterization stage 420, etc.). Reduce. In another embodiment, the geometry stage may generate primitives (eg, have functionality equivalent to a traditional geometry shader).

ジオメトリ処理ステージ４１０によって出力されたプリミティブは、プリミティブをピクセルから構成されるラスタ画像に変換するラスタ化ステージ４２０に供給される。具体的には、ラスタ化ステージ４２０は、３Ｄゲーミング世界内の視点（例えば、カメラ位置、ユーザの目の位置など）によって定義される二次元（２Ｄ）画像平面にシーン内のオブジェクトを投影するように構成される。単純化したレベルにおいて、ラスタ化ステージ４２０は、各々のプリミティブを検査し、どのピクセルが対応するプリミティブによって影響を与えられるかを判定する。具体的には、ラスタライザ４２０は、プリミティブをピクセルサイズのフラグメントに分割し、各フラグメントは、ディスプレイ内のピクセルに対応する。１つまたは複数のフラグメントは、画像を表示するとき、対応するピクセルの色に貢献し得ることに留意することが重要である。 The primitives output by the geometry processing stage 410 are provided to a rasterization stage 420 that converts the primitives into raster images made up of pixels. Specifically, rasterization stage 420 projects objects in the scene onto a two-dimensional (2D) image plane defined by a viewpoint within the 3D gaming world (e.g., camera position, user eye position, etc.). It is composed of At a simplified level, rasterization stage 420 examines each primitive and determines which pixels are affected by the corresponding primitive. Specifically, rasterizer 420 divides the primitive into pixel-sized fragments, each fragment corresponding to a pixel in the display. It is important to note that one or more fragments may contribute to the color of the corresponding pixel when displaying the image.

前述のように、クリッピング（視錐台から外側にあるフラグメントを識別及び無視する）並びに視点へのカリング（より近いオブジェクトによって閉塞されたフラグメントを無視する）などの追加の演算もラスタ化ステージ４２０によって実行され得る。クリッピングに関して、ジオメトリ処理ステージ４１０及び／またはラスタ化ステージ４２０は、ゲーミング世界の視点によって定義される視錐台の外側にあるプリミティブを識別して無視するように構成することができる。 As previously mentioned, additional operations such as clipping (identifying and ignoring fragments that are outside the view frustum) and culling to the viewpoint (ignoring fragments occluded by closer objects) are also performed by the rasterization stage 420. can be executed. With respect to clipping, the geometry processing stage 410 and/or the rasterization stage 420 may be configured to identify and ignore primitives that are outside the view frustum defined by the gaming world perspective.

ピクセル処理ステージ４３０は、ジオメトリ処理ステージによって作成されたパラメータ及び他のデータを使用して、ピクセルの結果の色などの値を生成する。具体的には、そのコアにおけるピクセル処理ステージ４３０は、プリミティブの色及び輝度が利用可能なライティングによりどのように変化するかを判定するよう、フラグメントに対してシェーディング演算を実行する。例えば、ピクセル処理ステージ４３０は、各々のフラグメントに対して奥行、色、法線、及びテクスチャ座標（例えば、テクスチャ詳細）を判定してもよく、さらに、フラグメントに対して適切なレベルの光、暗がり、及び色を判定してもよい。具体的には、ピクセル処理ステージ４３０は、色及び他の属性（例えば、視点からの距離に対するｚ－奥行、透過性に対するα値）を含む、各々のフラグメントの特徴を計算する。加えて、ピクセル処理ステージ４３０は、対応するフラグメントに影響を与える利用可能なライティングに基づいて、ライティング効果をフラグメントに適用する。さらに、ピクセル処理ステージ４３０は、各フラグメントにシャドウイング効果を適用し得る。 Pixel processing stage 430 uses the parameters and other data created by the geometry processing stage to generate values, such as the resulting color of the pixel. Specifically, the pixel processing stage 430 at its core performs shading operations on the fragments to determine how the color and brightness of the primitives change with the available lighting. For example, pixel processing stage 430 may determine depth, color, normal, and texture coordinates (e.g., texture details) for each fragment, and may also determine appropriate levels of light, darkness, etc. for the fragment. , and the color may be determined. Specifically, pixel processing stage 430 calculates characteristics of each fragment, including color and other attributes (eg, z-depth for distance from the viewpoint, alpha value for transparency). In addition, pixel processing stage 430 applies lighting effects to the fragments based on the available lighting affecting the corresponding fragments. Additionally, pixel processing stage 430 may apply shadowing effects to each fragment.

ピクセル処理ステージ４３０の出力は、処理されたフラグメント（例えば、テクスチャ及びシェーディング情報）を含み、グラフィックパイプライン４００の次のステージの出力マージャステージ４４０に送られる。出力マージャステージ４４０は、ピクセル処理ステージ４３０の出力ならびに既にメモリにある値などの他のデータを使用して、ピクセルの最終的な色を生成する。例えば、出力マージャステージ４４０は、ピクセル処理ステージ４３０から決定されたフラグメント及び／またはピクセルと、そのピクセルに対してＭＲＴにすでに書き込まれている値との間の値の、オプションのブレンディングを実行することができる。 The output of pixel processing stage 430 includes processed fragments (eg, texture and shading information) and is sent to the next stage of graphics pipeline 400, output merger stage 440. Output merger stage 440 uses the output of pixel processing stage 430 as well as other data, such as values already in memory, to generate the final color of the pixel. For example, output merger stage 440 may perform optional blending of values between the fragment and/or pixel determined from pixel processing stage 430 and the value already written to the MRT for that pixel. Can be done.

ディスプレイ内の各ピクセルの色値は、フレームバッファ（図示せず）に格納することができる。これらの値は、シーンの対応する画像を表示するときに、対応するピクセルにスキャンされる。特に、ディスプレイは、ピクセルごと、行ごと、左から右にあるいは右から左に、上から下にあるいは下から上に、または任意の他のパターンで、フレームバッファから色値を読み取り、画像を表示するときにそれらのピクセル値を使用してピクセルを照らす。 The color value of each pixel in the display may be stored in a frame buffer (not shown). These values are scanned into the corresponding pixels when displaying the corresponding image of the scene. In particular, the display reads color values from the frame buffer and displays the image pixel by pixel, line by line, left to right or right to left, top to bottom or bottom to top, or in any other pattern. illuminate pixels using their pixel values.

本開示の実施形態は、複数のＧＰＵを連携して使用して、単一の画像フレームを生成及び／またはレンダリングする。複数のＧＰＵを使用する際の難点は、各ＧＰＵに等量の作業を分散することにある。本開示の実施形態は、各ＧＰＵに等しい量の作業を提供することができ（すなわち、作業を概算で分散する）、レンダリングされるジオメトリの空間分散の解析を通じて、ピクセル数（すなわち解像度）及び複雑さの増加及び／または幾何学的な複雑さの増加、及び／または頂点及び／またはプリミティブあたりの処理量の増加をサポートし、動的に（つまり、フレームからフレームへ）スクリーン領域に対するＧＰＵのレスポンシビリティを調整して、ジオメトリ作業とピクセルの両方を最適化する。このように、ＧＰＵのレスポンシビリティの動的な分散は、図５Ａ～５Ｂ及び６Ａ～６Ｂに関連して以下でさらに説明されるように、スクリーン領域によって実行される。 Embodiments of the present disclosure use multiple GPUs in conjunction to generate and/or render a single image frame. The difficulty with using multiple GPUs is distributing an equal amount of work to each GPU. Embodiments of the present disclosure can provide each GPU with an equal amount of work (i.e., roughly distributing the work), and through analysis of the spatial distribution of the rendered geometry, the number of pixels (i.e., resolution) and complexity dynamically (i.e., from frame to frame) GPU responsiveness to screen area Adjust civility to optimize both geometry work and pixels. In this way, dynamic distribution of GPU responsivity is performed by screen area, as described further below in connection with FIGS. 5A-5B and 6A-6B.

図５Ａ～５Ｂは、純粋に例示を目的として、領域に再分割されたスクリーンのレンダリングを示しており、各領域は固定的な方法でＧＰＵに割り当てられている。つまり、ＧＰＵへの領域の割り当ては、画像フレームごとに変わらない。図５Ａでは、スクリーンは４つの象限に再分割され、そのそれぞれが異なるＧＰＵに割り当てられる。図５Ｂでは、スクリーンはより多数のインターリーブされた領域に再分割され、そのそれぞれがＧＰＵに割り当てられる。以下の図５Ａ～５Ｂの議論は、複数のＧＰＵが割り当てられた複数のスクリーン領域に対してマルチＧＰＵレンダリングを実行するときに生じる非効率性を示すことを意図している。図８は、本発明の実施形態による、より効率的なレンダリングを示す。 5A-5B show, purely for illustrative purposes, the rendering of a screen subdivided into regions, each region being assigned to a GPU in a fixed manner. In other words, the area allocation to the GPU does not change from image frame to image frame. In FIG. 5A, the screen is subdivided into four quadrants, each of which is assigned to a different GPU. In FIG. 5B, the screen is subdivided into a larger number of interleaved regions, each of which is assigned a GPU. The discussion of FIGS. 5A-5B below is intended to illustrate the inefficiencies that occur when performing multi-GPU rendering for multiple screen regions to which multiple GPUs are assigned. FIG. 8 illustrates more efficient rendering according to an embodiment of the invention.

具体的には、図５Ａは、マルチＧＰＵレンダリングを実行するときに象限（例えば、４つの領域）に再分割されるスクリーン５１０Ａの図である。示されるように、スクリーン５１０Ａは、４つの象限（例えば、Ａ、Ｂ、Ｃ、及びＤ）に再分割される。各象限は、一対一の関係で４つのＧＰＵ［ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ］のうちの１つに割り当てられる。つまり、ＧＰＵのレスポンシビリティは固定的領域割り当てによって分散され、各ＧＰＵは１つまたは複数のスクリーン領域に固定的に割り当てられる。例えば、ＧＰＵ－Ａは象限Ａに割り当てられ、ＧＰＵ－Ｂは象限Ｂに割り当てられ、ＧＰＵ－Ｃは象限Ｃに割り当てられ、ＧＰＵ－Ｄは象限Ｄに割り当てられる。 Specifically, FIG. 5A is an illustration of a screen 510A that is subdivided into quadrants (eg, four regions) when performing multi-GPU rendering. As shown, screen 510A is subdivided into four quadrants (eg, A, B, C, and D). Each quadrant is assigned to one of four GPUs [GPU-A, GPU-B, GPU-C, and GPU-D] in a one-to-one relationship. That is, GPU responsiveness is distributed by fixed area allocation, with each GPU being fixedly allocated to one or more screen areas. For example, GPU-A is assigned to quadrant A, GPU-B is assigned to quadrant B, GPU-C is assigned to quadrant C, and GPU-D is assigned to quadrant D.

ジオメトリはカリングできる。例えば、ＣＰＵ１６３は、各象限の錐台に対して境界ボックスをチェックし、対応する錐台にオーバーラップするオブジェクトのみをレンダリングするように各ＧＰＵに要求することができる。その結果、各ＧＰＵはジオメトリの一部のみをレンダリングするレスポンシビリティを持つ。例示の目的で、スクリーン５１０はジオメトリのピースを示し、各ピースは対応するオブジェクトであり、スクリーン５１０はオブジェクト５１１～５１７（例えば、ジオメトリのピース）を示す。ジオメトリのピースは、オブジェクト全体またはオブジェクトの一部（例えば、プリミティブなど）に対応し得ることが理解される。ＧＰＵ－Ａは、象限Ａにオーバーラップするオブジェクトがないため、オブジェクトをレンダリングしない。ＧＰＵ－Ｂはオブジェクト５１５と５１６をレンダリングする（オブジェクト５１５の一部が象限Ｂに存在するため、ＣＰＵのカリングテストは、ＧＰＵ－Ｂがそれをレンダリングする必要があると正確に結論付ける）。ＧＰＵ－Ｃはオブジェクト５１１と５１２をレンダリングする。ＧＰＵ－Ｄは、オブジェクト５１２、５１３、５１４、５１５、及び５１７をレンダリングする。 Geometry can be culled. For example, CPU 163 may check a bounding box for each quadrant's frustum and request each GPU to render only objects that overlap the corresponding frustum. As a result, each GPU has the responsibility of rendering only a portion of the geometry. For purposes of illustration, screen 510 shows pieces of geometry, each piece being a corresponding object, and screen 510 shows objects 511-517 (eg, pieces of geometry). It is understood that a piece of geometry may correspond to an entire object or a portion of an object (eg, a primitive, etc.). GPU-A does not render the object since there is no overlapping object in quadrant A. GPU-B renders objects 515 and 516 (because part of object 515 is in quadrant B, the CPU's culling test correctly concludes that GPU-B needs to render it). GPU-C renders objects 511 and 512. GPU-D renders objects 512, 513, 514, 515, and 517.

図５Ａにおいて、スクリーン５１０Ａが象限Ａ～Ｄに分割されるとき、状況によっては不均衡な量のジオメトリが１つの象限にある可能性があるため、各ＧＰＵが実行しなければならない作業の量は非常に異なる可能性がある。例えば、象限Ａにはジオメトリのピースがないが、象限Ｄにはジオメトリの５つのピース、またはジオメトリの少なくとも５つのピースの少なくとも一部がある。そのため、象限Ａに割り当てられたＧＰＵ－Ａはアイドル状態になるが、象限Ｄに割り当てられたＧＰＵ－Ｄは、対応する画像内のオブジェクトをレンダリングするときに不均衡にビジーになる。 In FIG. 5A, when the screen 510A is divided into quadrants A through D, the amount of work each GPU must perform is Could be very different. For example, quadrant A has no pieces of geometry, but quadrant D has five pieces of geometry, or at least a portion of at least five pieces of geometry. Therefore, GPU-A assigned to quadrant A will be idle, while GPU-D assigned to quadrant D will be disproportionately busy when rendering objects in the corresponding image.

図５Ｂは、本開示の一実施形態による、マルチＧＰＵレンダリングを実行するときにスクリーン５１０Ｂが複数のインターリーブされた領域に再分割されるように、スクリーンを領域に再分割するときの別の技法を示す。具体的には、スクリーン５１０Ｂは、単一の画像または一連の画像内の１つまたは複数の画像のそれぞれのマルチＧＰＵレンダリングを実行するときに、象限に再分割するのではなく、複数の領域に再分割される。例えば、スクリーン５１０Ｂは、ＧＰＵに対応する領域に再分割され得る。その場合、スクリーン５１０Ｂは、レンダリングのために同量のＧＰＵ（例えば、４つ）を使用しながら、より多数の領域（例えば、４象限よりも多い）に再分割される。スクリーン５１０Ａに示されるオブジェクト（５１１～５１７）は、スクリーン５１０Ｂにも同じ対応する位置に示されている。 FIG. 5B illustrates another technique for subdividing a screen into regions such that screen 510B is subdivided into multiple interleaved regions when performing multi-GPU rendering, according to an embodiment of the present disclosure. show. Specifically, the screen 510B is divided into multiple regions rather than subdivided into quadrants when performing multi-GPU rendering of a single image or each of one or more images in a series of images. be redivided. For example, screen 510B may be subdivided into regions corresponding to GPUs. In that case, screen 510B is subdivided into a larger number of regions (eg, more than four quadrants) while using the same amount of GPUs (eg, four) for rendering. The objects (511-517) shown on screen 510A are also shown in the same corresponding positions on screen 510B.

具体的には、４つのＧＰＵ（例えば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）を使用して、対応するアプリケーション用の画像をレンダリングする。各ＧＰＵは、対応する領域にオーバーラップするジオメトリのレンダリングのレスポンシビリティを持つ。つまり、各ＧＰＵは対応する領域のセットに割り当てられる。例えば、ＧＰＵ－Ａは対応するセットでＡとラベル付けされた各領域のレスポンシビリティを有し、ＧＰＵ－Ｂは対応するセットでＢとラベル付けされた各領域のレスポンシビリティを有し、ＧＰＵ－Ｃは対応するセットでＣとラベル付けされた各領域のレスポンシビリティを有し、ＧＰＵ－Ｄは、対応するセットでＤとラベル付けされた各領域のレスポンシビリティを有する。 Specifically, four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D) are used to render images for the corresponding application. Each GPU has the responsibility of rendering geometry that overlaps the corresponding region. That is, each GPU is assigned to a corresponding set of regions. For example, GPU-A has a responsibility for each region labeled A in the corresponding set, GPU-B has a responsibility for each region labeled B in the corresponding set, and GPU-B has a responsibility for each region labeled B in the corresponding set. C has a responsibility for each region labeled C in the corresponding set, and GPU-D has a responsibility for each region labeled D in the corresponding set.

さらに、領域は特定のパターンでインターリーブされる。領域のインターリーブ（及びより多くの数）により、各ＧＰＵが実行する必要がある作業量は、はるかにバランスが取れたものになり得る。例えば、スクリーン５１０Ｂのインターリービングのパターンは、領域Ａ－Ｂ－Ａ－Ｂなど、及び領域Ｃ－Ｄ－Ｃ－Ｄなどを含む交互の行を含む。領域をインターリーブする他のパターンも、本開示の実施形態でサポートされる。例えば、パターンには、領域の反復シーケンス、均等に分散した領域、領域の不均等な分散、領域のシーケンスの反復行、領域のランダムシーケンス、領域のシーケンスのランダム行などが含まれ得る。 Furthermore, the regions are interleaved in a specific pattern. With interleaving (and a larger number) of regions, the amount of work each GPU needs to perform can be much more balanced. For example, the pattern of interleaving of screen 510B includes alternating rows including areas ABAB, etc., and areas CDCD, etc. Other patterns of interleaving regions are also supported by embodiments of the present disclosure. For example, the pattern may include a repeating sequence of regions, an evenly distributed region, an uneven distribution of regions, a repeating row of a sequence of regions, a random sequence of regions, a random row of a sequence of regions, and the like.

領域の数の選択は重要である。例えば、領域の分散が細かすぎる場合（例えば、領域の数が多すぎて最適ではない場合）、各ＧＰＵは相変わらずジオメトリの大部分またはすべてを処理する必要がある。例えば、ＧＰＵがレスポンシビリティを有するすべての領域に対してオブジェクトの境界ボックスをチェックするのは難しい場合がある。また、境界ボックスをタイムリーにチェックできたとしても、領域サイズが小さいため、結果として、画像内のすべてのオブジェクトがＧＰＵのそれぞれの少なくとも１つの領域でオーバーラップするので、各ＧＰＵがほとんどのジオメトリを処理しなければならない可能性が高くなる（例えば、ＧＰＵは、オブジェクトの一部のみがそのＧＰＵに割り当てられた領域のセット内の少なくとも１つの領域とオーバーラップしている場合でも、オブジェクト全体を処理する）。 The selection of the number of regions is important. For example, if the distribution of regions is too fine (eg, the number of regions is too large to be optimal), each GPU still needs to process most or all of the geometry. For example, it may be difficult for the GPU to check an object's bounding box for all areas for which it has responsiveness. Also, even if the bounding box could be checked in a timely manner, the small region size would result in all objects in the image overlapping by at least one region of each of the GPUs, so each GPU would (e.g., a GPU may have to process an entire object even if only part of the object overlaps at least one region in the set of regions allocated to that GPU. processing).

結果として、領域の数の選択は重要である。選択する領域が少なすぎるか、または多すぎると、ＧＰＵ処理を実行するときに非効率になる可能性がある（例えば、各ＧＰＵがほとんどまたはすべてのジオメトリを処理する）、または不均衡につながる可能性がある（例えば、１つのＧＰＵが別のＧＰＵよりも多くのオブジェクトを処理する）。それらの場合、画像をレンダリングするために複数のＧＰＵがあっても、これらの非効率性のために、スクリーンのピクセル数とジオメトリの密度の両方の対応する増加をサポートする能力がない（つまり、４つのＧＰＵはピクセルの４倍を書き込んで頂点またはプリミティブの４倍を処理することはできない）。したがって、本開示の実施形態では、（「ジオメトリ解析」を介して）情報を生成して、どのオブジェクトまたは複数オブジェクトがスクリーン領域のそれぞれに存在するかを示すことができる。レンダリング中またはレンダリング前にジオメトリ解析を実行することができ、以下でさらに説明するように、結果として得られる情報を使用して、対応する画像フレームをさらにレンダリングするためにスクリーン領域をＧＰＵに動的に割り当てることができる。つまり、スクリーン領域は対応するＧＰＵに固定されるのではなく、対応する画像フレームをレンダリングするためにＧＰＵに動的に割り当てられ得る。 As a result, the choice of the number of regions is important. Selecting too few or too many regions can lead to inefficiencies when performing GPU processing (e.g., each GPU processes most or all of the geometry), or can lead to imbalances. (e.g., one GPU processes more objects than another GPU). In those cases, even though they have multiple GPUs to render images, due to these inefficiencies they do not have the ability to support a corresponding increase in both the number of pixels and the density of the screen (i.e. Four GPUs cannot write four times as many pixels and process four times as many vertices or primitives). Accordingly, embodiments of the present disclosure may generate information (via "geometric analysis") to indicate which object or objects are present in each of the screen regions. Geometry analysis can be performed during or before rendering, and the resulting information is used to dynamically transfer screen areas to the GPU for further rendering of the corresponding image frame, as described further below. can be assigned to That is, the screen area is not fixed to the corresponding GPU, but can be dynamically allocated to the GPU for rendering the corresponding image frame.

図６Ａ～６Ｂは、本開示の様々な実施形態における、画像フレームのオブジェクト全体及び／またはオブジェクトの部分のジオメトリレンダリングのために、スクリーン領域をＧＰＵに動的に割り当てるためにジオメトリ解析を実行するために、画像フレーム内のオブジェクトをより小さな部分に分割する利点を示す。具体的には、オブジェクトのマルチＧＰＵレンダリングは、スクリーン内のオブジェクトにジオメトリ解析を実行することにより、単一の画像フレームに対して実行される。情報は「ジオメトリのピース」に対して生成され、ジオメトリのピースは、オブジェクト全体またはオブジェクトの一部であり得る。例えば、ジオメトリのピースは、オブジェクト６１０またはオブジェクト６１０の一部であり得る。具体的には、ＧＰＵは、ジオメトリと複数のスクリーン領域のそれぞれとの間の関係を決定するために、ジオメトリのピース（例えば、オブジェクト全体及び／またはオブジェクトの一部）に割り当てられる。つまり、連携するＧＰＵは、ジオメトリのピースのそれぞれとスクリーン領域のそれぞれとの間の関係を提供する情報を決定する。情報に対して解析が実行され、対応する画像フレームの後続のレンダリングのためにスクリーン領域がＧＰＵに動的に割り当てられる。ジオメトリ解析とその後のレンダリング、例えばジオメトリのレンダリング中に、オブジェクトがジオメトリレンダリング用の単一のＧＰＵに関連付けられている場合（例えば、オブジェクトを含むすべてのスクリーン領域を単一のＧＰＵに動的に割り当てる）、画像フレームをレンダリングするときに他のＧＰＵは、本開示の一実施形態に従って、そのオブジェクト全体をスキップでき、これは、ジオメトリの効率的な処理をもたらす。さらに、オブジェクトをより小さな部分に分割すると、ジオメトリ解析及び／または対応する画像フレームでのジオメトリのレンダリングを実行する際の効率をさらに高めることができる。 6A-6B illustrate performing geometry analysis to dynamically allocate screen space to a GPU for geometry rendering of an entire object and/or a portion of an object in an image frame in various embodiments of the present disclosure. shows the advantage of dividing objects within an image frame into smaller parts. Specifically, multi-GPU rendering of objects is performed on a single image frame by performing geometric analysis on the objects within the screen. Information is generated for "pieces of geometry," which may be entire objects or portions of objects. For example, the piece of geometry may be object 610 or a portion of object 610. Specifically, a GPU is assigned to a piece of geometry (eg, an entire object and/or a portion of an object) to determine a relationship between the geometry and each of a plurality of screen regions. That is, the cooperating GPUs determine information that provides a relationship between each piece of geometry and each screen area. Analysis is performed on the information and screen area is dynamically allocated to the GPU for subsequent rendering of the corresponding image frame. Geometry analysis and subsequent rendering, e.g. during geometry rendering, if the object is associated with a single GPU for geometry rendering (e.g. dynamically allocating all screen area containing the object to a single GPU) ), other GPUs can skip the entire object when rendering an image frame, according to one embodiment of the present disclosure, which results in efficient processing of the geometry. Moreover, dividing an object into smaller parts may further increase efficiency in performing geometry analysis and/or rendering of geometry in a corresponding image frame.

図６Ａは、本開示の一実施形態による、複数のＧＰＵが連携して対応する画像フレームをレンダリングするときに、スクリーン領域に対するオブジェクトの関係を決定するための、オブジェクト全体のジオメトリ解析（すなわち、対応するドローコールによって使用される、または生成されるジオメトリの量）を示す。オブジェクト全体がレンダリングされる場合（つまり、ドローコールによって使用または生成されるジオメトリが部分に分割されない場合）、オブジェクトとオーバーラップするスクリーン領域のレンダリングのレスポンシビリティを有する各ＧＰＵは、オブジェクト全体をレンダリングする必要がある。具体的には、ジオメトリ解析中に、オブジェクト６１０は領域６２０Ａとオーバーラップすると判断され得、オブジェクト６１０はまた領域６２０Ｂとオーバーラップすると判断され得る。すなわち、オブジェクト６１０の部分６１０Ａは領域６２０Ａとオーバーラップし、オブジェクト６１０の部分６１０Ｂは領域６２０Ｂとオーバーラップする。続いて、ＧＰＵ－Ａは、スクリーン領域６２０Ａ内のオブジェクトをレンダリングするレスポンシビリティを割り当てられ、ＧＰＵ－Ｂは、スクリーン領域６２０Ｂ内のオブジェクトをレンダリングするレスポンシビリティを割り当てられる。オブジェクトは全体としてレンダリングされるので、ＧＰＵ－Ａは、オブジェクト６１０を完全にレンダリングする、すなわち、領域６２０Ａ及び６２０Ｂの両方にわたるプリミティブを含む、オブジェクト内のすべてのプリミティブを処理するタスクを与えられる。この特定の例では、ＧＰＵ－Ｂもまた、オブジェクト６１０全体をレンダリングするタスクを与えられる。つまり、対応する画像フレーム内のオブジェクトのジオメトリのレンダリングを実行するときに、ＧＰＵ－ＡとＧＰＵ－Ｂによる作業の重複が発生する可能性がある。また、ＧＰＵ間で分散するオブジェクト（つまり、ドローコール）の数が少ない場合、ジオメトリ解析自体のバランスをとるのが難しい場合がある。 FIG. 6A illustrates the overall object geometry analysis (i.e., the correspondence the amount of geometry used or generated by the draw call. If the entire object is rendered (that is, the geometry used or generated by the draw call is not split into parts), then each GPU that has the rendering responsiveness of the screen area that overlaps with the object will render the entire object. There is a need. Specifically, during geometry analysis, object 610 may be determined to overlap region 620A, and object 610 may also be determined to overlap region 620B. That is, portion 610A of object 610 overlaps region 620A, and portion 610B of object 610 overlaps region 620B. Subsequently, GPU-A is assigned the responsibility to render objects in screen region 620A, and GPU-B is assigned the responsibility to render objects in screen region 620B. Since the object is rendered as a whole, GPU-A is given the task of completely rendering object 610, ie, processing all primitives within the object, including primitives spanning both regions 620A and 620B. In this particular example, GPU-B is also given the task of rendering the entire object 610. That is, duplication of work by GPU-A and GPU-B may occur when performing rendering of the geometry of objects in corresponding image frames. Furthermore, if the number of objects (that is, draw calls) distributed among GPUs is small, it may be difficult to balance the geometry analysis itself.

図６Ｂは、本開示の一実施形態による、複数のＧＰＵが連携して対応する画像フレームをレンダリングするときに、オブジェクトの一部のスクリーン領域に対する関係を決定するためのオブジェクトの一部のジオメトリ解析を示す。示されているように、ドローコールによって使用または生成されたジオメトリは、オブジェクトのこれらの部分を作成するために再分割される。例えば、オブジェクト６１０は、ドローコールによって使用または生成されたジオメトリがより小さなジオメトリのピースに再分割されるように、ピースに分割されてもよい。その場合、ジオメトリのより小さなピースと各スクリーン領域との間の関係（例えば、オーバーラップ）を決定するために、ジオメトリ解析中にジオメトリのより小さなピースについて情報が生成される。この情報を使用してジオメトリ解析が実行され、ＧＰＵ間のスクリーン領域ごとにレンダリングのレスポンシビリティが動的に割り当てられ、対応する画像フレームのジオメトリのより小さなピースがレンダリングされる。各ＧＰＵは、対応する画像フレームのレンダリングを実行するときに、レスポンシビリティを有するスクリーン領域とオーバーラップするジオメトリのより小さなピースのみをレンダリングする。そのため、各ＧＰＵは、対応する画像フレームのジオメトリのピースをレンダリングするためのスクリーン領域のセットに割り当てられる。つまり、画像フレームごとにＧＰＵのレスポンシビリティが一意に割り当てられる。このようにして、ジオメトリ解析及び／または対応する画像フレーム内のオブジェクトのジオメトリのレンダリングを実行するときに、ＧＰＵ間で作業の重複が少なくなり得るため、対応する画像フレームをレンダリングするときの効率が向上する。 FIG. 6B illustrates geometric analysis of a portion of an object to determine the relationship of the portion of the object to a screen area when multiple GPUs cooperate to render a corresponding image frame, according to an embodiment of the present disclosure. shows. As shown, the geometry used or generated by the draw calls is subdivided to create these parts of the object. For example, object 610 may be divided into pieces such that the geometry used or generated by the draw call is subdivided into smaller pieces of geometry. In that case, information is generated about the smaller pieces of geometry during geometry analysis to determine the relationship (eg, overlap) between the smaller pieces of geometry and each screen area. Using this information, geometry analysis is performed to dynamically allocate rendering responsiveness for each screen area between GPUs and render smaller pieces of the geometry of the corresponding image frame. Each GPU renders only a smaller piece of geometry that overlaps the responsive screen area when performing rendering of the corresponding image frame. As such, each GPU is assigned a set of screen areas for rendering the corresponding image frame's piece of geometry. In other words, GPU responsiveness is uniquely assigned to each image frame. In this way, there may be less duplication of work between GPUs when performing geometry analysis and/or rendering of the geometry of objects in the corresponding image frame, thereby increasing efficiency when rendering the corresponding image frame. improves.

一実施形態では、コマンドバッファ内のドローコールは同じままであるが、レンダリングする間、ＧＰＵはジオメトリをピースに分割する。ジオメトリのピースは、位置キャッシュ及び／またはパラメータキャッシュが割り振られるサイズとほぼ同じとしてもよい。各ＧＰＵは、ＧＰＵが割り当てられたスクリーン領域とオーバーラップするピースのみをレンダリングするように、これらのピースをレンダリングまたはスキップする。 In one embodiment, the draw calls in the command buffer remain the same, but during rendering, the GPU breaks the geometry into pieces. The piece of geometry may be approximately the same size as the position cache and/or parameter cache is allocated. Each GPU renders or skips these pieces so that it only renders those pieces that overlap with its assigned screen area.

例えば、オブジェクト６１０は、領域テストに使用されるジオメトリのピースがオブジェクト６１０のこれらのより小さな部分に対応するように、部分に分割される。示されるように、オブジェクト６１０は、ジオメトリ「ａ」、「ｂ」、「ｃ」、「ｄ」、「ｅ」、及び「ｆ」のピースに分割される。ジオメトリ解析の後、ＧＰＵ－Ａは、対応する画像フレームをレンダリングするときにジオメトリ「ａ」、「ｂ」、「ｃ」、「ｄ」、及び「ｅ」のピースをレンダリングするために、スクリーン領域６２０Ａに動的に割り当てられてもよい。つまり、ＧＰＵ－Ａはジオメトリ「ｆ」のピースをレンダリングするのをスキップできる。また、ジオメトリ解析の後、ＧＰＵ－Ｂは、対応する画像フレームをレンダリングするときに、ジオメトリ「ｄ」、「ｅ」、及び「ｆ」のピースをレンダリングするために、スクリーン領域６２０Ｂに割り当てられ得る。つまり、ＧＰＵ－Ｂは、ジオメトリ「ａ」、「ｂ」、及び「ｃ」のピースをレンダリングするのをスキップできる。示されるように、オブジェクト６１０を完全にレンダリングする代わりに、ＧＰＵ－Ａ及びＧＰＵ－Ｂのそれぞれによってジオメトリ「ｄ」及び「ｅ」のピースのみがレンダリングされるので、ＧＰＵ－ＡとＧＰＵ－Ｂとの間の作業の重複は少ない。 For example, object 610 is divided into parts such that the pieces of geometry used for region testing correspond to these smaller parts of object 610. As shown, object 610 is divided into pieces of geometry "a", "b", "c", "d", "e", and "f". After geometry analysis, GPU-A uses screen area to render pieces of geometry "a", "b", "c", "d", and "e" when rendering the corresponding image frame. 620A. That is, GPU-A can skip rendering the piece of geometry "f". Also, after geometry analysis, GPU-B may be assigned to screen area 620B to render pieces of geometry "d", "e", and "f" when rendering the corresponding image frame. . That is, GPU-B can skip rendering pieces of geometry "a", "b", and "c". As shown, instead of rendering the object 610 completely, only pieces of geometry "d" and "e" are rendered by GPU-A and GPU-B, respectively, so that GPU-A and GPU-B There is little duplication of work between the two.

レンダリング中にジオメトリ解析を実行することによるジオメトリのマルチＧＰＵレンダリング
図１～３のクラウドゲームネットワーク１９０（例えば、ゲームサーバ１６０内）及びＧＰＵリソース３６５の詳細な説明とともに、図７の流れ図７００は、本開示の一実施形態による、レンダリング中にジオメトリ解析を実行することによって、アプリケーションによって生成された画像フレームのジオメトリのマルチＧＰＵレンダリングを実装するときのグラフィック処理の方法を示す。具体的には、多数のＧＰＵが連携して画像フレームを生成する。レンダリングの特定のフェーズに対するレスポンシビリティは、各画像フレームのスクリーン領域に基づいて複数のＧＰＵ間で分割される。ジオメトリのレンダリング中に、ＧＰＵはジオメトリ及びそのスクリーン領域との関係に関する情報を生成する。この情報は、ＧＰＵをスクリーン領域に割り当てるために使用され、より効率的なレンダリングを可能にする。このようにして、複数のＧＰＵリソースを使用して、アプリケーションの実行時に画像フレームのオブジェクトのレンダリングを効率的に実行する。前述のように、様々なアーキテクチャは、クラウドゲームシステムの１つまたは複数のクラウドゲームサーバ内、または、複数のＧＰＵを有するハイエンドグラフィックカードを含むパーソナルコンピュータやゲームコンソールなどのスタンドアロンシステム内などで、レンダリング中に領域テストを介してアプリケーション用のジオメトリのマルチＧＰＵレンダリングを実行することにより、単一の画像をレンダリングするために連携する、複数のＧＰＵを含むことができる。 Multi-GPU Rendering of Geometry by Performing Geometry Analysis During Rendering Along with the detailed description of cloud gaming network 190 (e.g., within game server 160) and GPU resources 365 of FIGS. 1-3, flowchart 700 of FIG. 3 illustrates a method for graphics processing when implementing multi-GPU rendering of the geometry of image frames generated by an application by performing geometry analysis during rendering, according to an embodiment of the disclosure; Specifically, a large number of GPUs work together to generate image frames. Responsibility for a particular phase of rendering is divided among multiple GPUs based on the screen area of each image frame. During rendering of geometry, the GPU generates information about the geometry and its relationship to the screen area. This information is used to allocate GPUs to screen areas, allowing for more efficient rendering. In this manner, multiple GPU resources are used to efficiently perform rendering of objects in image frames during application execution. As mentioned above, various architectures are available for rendering, such as within one or more cloud gaming servers of a cloud gaming system, or within a standalone system such as a personal computer or game console that includes a high-end graphics card with multiple GPUs. Performing multi-GPU rendering of geometry for an application through area testing during the process can involve multiple GPUs working together to render a single image.

７１０において、方法は、複数のＧＰＵを使用してグラフィックをレンダリングすることを含み、特定のフェーズにおいて、レンダリングのレスポンシビリティは、スクリーン領域に基づいて複数のＧＰＵの間で動的に分割される。特に、単一の画像フレーム、及び／またはリアルタイムアプリケーション用の一連の画像フレームの１つまたは複数の画像フレームのそれぞれをレンダリングするときにマルチＧＰＵ処理が実行され、各画像フレームは複数のジオメトリのピースを含む。特定のフェーズでは、各ＧＰＵがその割り当てられたスクリーン領域でジオメトリのピースをレンダリングするように、ＧＰＵレンダリングのレスポンシビリティが各画像フレームの複数のスクリーン領域間で動的に割り当てられる。つまり、各ＧＰＵは、対応するレスポンシビリティ（例えば、対応するスクリーン領域）のディビジョンあるいは分割部を有する。 At 710, the method includes rendering graphics using multiple GPUs, and in certain phases, rendering responsiveness is dynamically divided among the multiple GPUs based on screen area. In particular, multi-GPU processing is performed when rendering a single image frame and/or each of one or more image frames of a series of image frames for real-time applications, where each image frame is composed of multiple pieces of geometry. including. In a particular phase, GPU rendering responsivity is dynamically allocated among multiple screen areas of each image frame such that each GPU renders a piece of geometry in its assigned screen area. That is, each GPU has a corresponding division or division of responsiveness (eg, corresponding screen area).

７２０において、方法は、対応する複数のジオメトリのピースを含む画像フレームをレンダリングするために複数のＧＰＵを連携して使用することを含む。一実施形態では、レンダリング時に、レンダリングのプレパスフェーズが実行される。一実施形態では、このレンダリングのプレパスフェーズは、Ｚプレパスであり、複数のジオメトリのピースがレンダリングされる。 At 720, the method includes using multiple GPUs in conjunction to render an image frame that includes corresponding multiple pieces of geometry. In one embodiment, during rendering, a pre-pass phase of rendering is performed. In one embodiment, this pre-pass phase of rendering is a Z pre-pass in which multiple pieces of geometry are rendered.

レンダリングのプレパスフェーズを実行するために、７２０で、方法は、複数のＧＰＵ間でのレンダリングのＺプレパスフェーズ中に画像フレームの複数のジオメトリのピースを処理するレスポンシビリティを分割することを含む。すなわち、複数のジオメトリのピースのそれぞれは、Ｚプレパスを実行するために対応するＧＰＵに割り当てられ、及び／またはＧＰＵのそれぞれには、それがレスポンシビリティを有するスクリーン領域のセットが割り当てられる。こうして、複数のジオメトリのピースは、複数のＧＰＵにおいてＺプレパスフェーズでレンダリングされ、１つまたは複数のＺバッファを生成する。具体的には、各ＧＰＵは、Ｚプレパスフェーズで対応するジオメトリのピースをレンダリングして、対応するＺバッファを生成する。例えば、ジオメトリの対応するピースについて、Ｚバッファは、投影面上のピクセルからジオメトリのピースまでの距離を測定する対応するｚ値（例えば、深度値）を含み得る。隠されたジオメトリまたはオブジェクトは、当技術分野で周知のように、Ｚバッファから削除することができる。 To perform the pre-pass phase of rendering, at 720, the method includes partitioning the responsivity of processing multiple pieces of geometry of an image frame during the Z pre-pass phase of rendering among multiple GPUs. That is, each of the plurality of geometry pieces is assigned a corresponding GPU to perform the Z pre-pass, and/or each GPU is assigned a set of screen areas to which it has responsiveness. Thus, pieces of geometry are rendered in a Z pre-pass phase on multiple GPUs to generate one or more Z buffers. Specifically, each GPU renders a corresponding piece of geometry in a Z pre-pass phase to generate a corresponding Z buffer. For example, for a corresponding piece of geometry, a Z-buffer may include a corresponding z value (eg, a depth value) that measures the distance of the piece of geometry from a pixel on the projection plane. Hidden geometry or objects can be removed from the Z-buffer as is known in the art.

一実施形態では、各ＧＰＵは専用のＺバッファを有することができる。例えば、第１のＧＰＵは、Ｚプレパスフェーズでジオメトリの第１のピースをレンダリングして、第１のＺバッファを生成する。他のＧＰＵは、Ｚプレパスフェーズで対応するジオメトリのピースをレンダリングして、対応するＺバッファを生成する。一実施形態では、各ＧＰＵは、その対応するＺバッファ内のそのデータを複数のＧＰＵのそれぞれに送信し、対応するＺバッファが更新されて画像フレームのジオメトリをレンダリングするときに使用するためにほぼ同様になるようにする。すなわち、各ＧＰＵは、ＧＰＵの対応する各Ｚバッファが同様に更新されるように、すべてのＺバッファから受信したデータをマージするように構成される。 In one embodiment, each GPU may have a dedicated Z-buffer. For example, a first GPU renders a first piece of geometry in a Z pre-pass phase to generate a first Z buffer. The other GPU renders the corresponding piece of geometry in the Z pre-pass phase to generate the corresponding Z buffer. In one embodiment, each GPU sends its data in its corresponding Z-buffer to each of the plurality of GPUs such that the corresponding Z-buffer is updated approximately for use in rendering the geometry of the image frame. Make it similar. That is, each GPU is configured to merge data received from all Z-buffers such that each of the GPU's corresponding Z-buffers is updated as well.

７３０において、方法は、画像フレームの複数のジオメトリのピース及びそれらの複数のスクリーン領域との関係に関する情報を生成することを含む。一実施態様では、情報は、レンダリングのプレパスフェーズ中に生成される。例えば、ジオメトリのピースをレンダリングしている間に情報が第１のＧＰＵで生成され、その情報はジオメトリのピースがどのスクリーン領域にオーバーラップするかを示すことができる。前述のように、ジオメトリのピースは、オブジェクト全体（つまり、個々のドローコールによって使用または生成されたジオメトリ）またはオブジェクトの一部（例えば、個々のプリミティブ、プリミティブのグループなど）であり得る。さらに、情報は、対応するスクリーン領域内のジオメトリのピースの存在を含むことができる。情報は、対応するスクリーン領域内のジオメトリのピースの存在に関する控えめな概算を含むことができる。情報は、ジオメトリのピースがスクリーン領域でカバーするピクセル面積または概算ピクセル面積（例えば、カバレッジ）を含むことができる。情報は、スクリーン領域に書き込まれたピクセルの数を含むことができる。情報は、レンダリングのＺプレパスフェーズ中にスクリーン領域ごとのジオメトリのピースごとにＺバッファに書き込まれたピクセルの数を含むことができる。 At 730, the method includes generating information regarding the plurality of geometric pieces of the image frame and their relationships to the plurality of screen regions. In one implementation, the information is generated during the pre-pass phase of rendering. For example, information may be generated on the first GPU while rendering a piece of geometry, and the information may indicate which screen area the piece of geometry overlaps. As mentioned above, a piece of geometry can be an entire object (i.e., geometry used or produced by an individual draw call) or a portion of an object (e.g., an individual primitive, a group of primitives, etc.). Additionally, the information may include the presence of the piece of geometry within the corresponding screen area. The information may include a conservative estimate of the presence of a piece of geometry within the corresponding screen area. The information may include the pixel area or approximate pixel area (eg, coverage) that the piece of geometry covers in the screen area. The information may include the number of pixels written to the screen area. The information may include the number of pixels written to the Z buffer per piece of geometry per screen area during the Z pre-pass phase of rendering.

７４０において、方法は、複数のＧＰＵへのスクリーン領域のその後の割り当てにおいてこの情報を使用することを含む。具体的には、各ＧＰＵは、ジオメトリパスである可能性があるレンダリングの後続のフェーズ中に画像フレームをレンダリングするために、情報に基づいて、対応するスクリーン領域に割り当てられる。このようにして、ＧＰＵへのスクリーン領域の割り当ては、画像フレームごとに変化することができ、つまり、動的であり得る。 At 740, the method includes using this information in a subsequent allocation of screen area to multiple GPUs. Specifically, each GPU is informedly assigned to a corresponding screen area for rendering an image frame during a subsequent phase of rendering, which may be a geometry pass. In this way, the allocation of screen area to the GPU can change from image frame to image frame, ie, can be dynamic.

図８は、本開示の一実施形態による、現在の画像フレームのレンダリング中に実行される現在の画像フレームのジオメトリの解析に基づくジオメトリレンダリング（すなわち、ＭＲＴへのジオメトリのピースのレンダリング）のためのＧＰＵへのスクリーン領域の動的割り当てを示すスクリーン８００の図である。図示のように、スクリーン８００は領域に再分割することができ、各領域は説明のためにほぼ等しいサイズである。他の実施形態において、領域のそれぞれは、様々なサイズ及び形状とすることができる。例えば、領域８１０は、スクリーン８００の等しい再分割を表す。 FIG. 8 shows a diagram for geometry rendering (i.e., rendering a piece of geometry into an MRT) based on an analysis of the geometry of the current image frame performed during rendering of the current image frame, according to an embodiment of the present disclosure. FIG. 8 is a diagram of a screen 800 illustrating dynamic allocation of screen space to GPUs. As shown, screen 800 can be subdivided into regions, each region being approximately equal in size for purposes of illustration. In other embodiments, each of the regions can be of various sizes and shapes. For example, region 810 represents an equal subdivision of screen 800.

スクリーン８００に示されるオブジェクト及びオブジェクトの位置は、図５Ａのスクリーン５１０Ａ及び図５Ｂのスクリーン５１０Ｂに示されるオブジェクト及びそれらの位置と同一である。例えば、オブジェクト５１１～５１７はスクリーン８００に示される。図５Ａは、ジオメトリレンダリングのためにＧＰＵに固定的に割り当てられる象限へのスクリーン５１０Ａの分割を示す。図５Ｂは、ジオメトリレンダリングのためにＧＰＵに固定的方式で割り当てられる領域へのスクリーン５１０Ｂの分割を示す。図８は、オブジェクト５１１～５１７を含む現在の画像フレームのＧＰＵへのスクリーン領域の動的割り当てを示す。割り当ては、画像フレームごとに実行される。すなわち、次の画像フレームでは、オブジェクト５１１～５１７は異なる位置にある可能性があり、したがって、次の画像フレームのスクリーン領域の割り当ては、現在の画像フレームの割り当てとは異なる可能性がある。例えば、ＧＰＵ－Ａはスクリーン領域のセット８３２に割り当てられ、オブジェクト５１１及び５１２をレンダリングする。また、ＧＰＵ－Ｂはスクリーン領域のセット８３４に割り当てられ、オブジェクト５１３、５１５、及び５１７をレンダリングする。ＧＰＵ－Ｃはスクリーン領域のセット８３６に割り当てられ、オブジェクト５１２、５１３、５１４、及び５１７をレンダリングする。そして、ＧＰＵ－Ｄはスクリーン領域のセット８３８に割り当てられ、オブジェクト５１５及び５１６をレンダリングする。オブジェクトがさらに部分に分割されると、より小さな部分ほどＧＰＵ領域間のオーバーラップが少なくなるため、レンダリングの重複がより少なくなる可能性がある。つまり、対応するコマンドバッファ内のドローコールは同じままであるが、レンダリング中にＧＰＵは、ジオメトリを、潜在的にほぼ位置及び／またはパラメータキャッシュが割り振られるサイズであるピースなどのピース（例えば、オブジェクトの部分）に分割し、それらがジオメトリレンダリング用にそのＧＰＵに割り当てられたスクリーン領域とオーバーラップするかどうかに応じて、それらのピースをレンダリングまたはスキップする。 The objects and their positions shown on screen 800 are the same as the objects and their positions shown on screen 510A of FIG. 5A and screen 510B of FIG. 5B. For example, objects 511-517 are shown on screen 800. FIG. 5A shows the division of screen 510A into quadrants that are fixedly assigned to GPUs for geometry rendering. FIG. 5B shows the division of screen 510B into regions that are assigned in a fixed manner to the GPU for geometry rendering. FIG. 8 shows the dynamic allocation of screen space to the GPU for the current image frame containing objects 511-517. Allocation is performed for each image frame. That is, in the next image frame, objects 511-517 may be in different positions, and therefore the screen area allocation of the next image frame may be different than the allocation of the current image frame. For example, GPU-A is assigned to set of screen areas 832 and renders objects 511 and 512. GPU-B is also assigned to a set of screen regions 834 and renders objects 513, 515, and 517. GPU-C is assigned to a set of screen regions 836 and renders objects 512, 513, 514, and 517. GPU-D is then assigned to a set of screen areas 838 to render objects 515 and 516. If the object is further divided into parts, there may be less overlap in rendering because smaller parts have less overlap between GPU regions. That is, the draw call in the corresponding command buffer remains the same, but during rendering the GPU draws the geometry into a piece (e.g., an object ) and render or skip those pieces depending on whether they overlap the screen area allocated to that GPU for geometry rendering.

一実施形態では、スクリーン領域のＧＰＵへの割り当ては、ジオメトリをレンダリングするときに各ＧＰＵによってほぼ等しい量のピクセル作業が実行されるように処理され得る。オブジェクトに関連付けられたピクセルシェーダは複雑さが異なる場合があるため、対応するオブジェクトによってカバーされるスクリーン面積の量が必ずしも等しいとは限らない。例えば、ＧＰＵ－Ｄは４つの領域のレンダリングのレスポンシビリティを有し、ＧＰＵ－Ａは６つの領域のレンダリングのレスポンシビリティを有するが、それらの対応するピクセル及び／またはレンダリング作業はほぼ等しいものであり得る。つまり、オブジェクトごとにレンダリングコストが異なり、ピクセル、プリミティブ、または頂点あたりのコストがオブジェクトごとに高くなる、または低くなる可能性がある。このピクセル、プリミティブ、または頂点などごとのコストは、各ＧＰＵで利用できるようにして、情報の生成に使用することができるか、または情報として含めることができる。あるいは、スクリーン領域を割り当てるときにコストを使用することもできる。 In one embodiment, the allocation of screen space to GPUs may be handled such that approximately equal amounts of pixel work are performed by each GPU when rendering geometry. Pixel shaders associated with objects may differ in complexity, so the amount of screen area covered by corresponding objects is not necessarily equal. For example, GPU-D has a rendering responsiveness of 4 regions and GPU-A has a rendering responsiveness of 6 regions, but their corresponding pixels and/or rendering efforts are approximately equal. obtain. This means that each object has a different rendering cost, and each object can have a higher or lower cost per pixel, primitive, or vertex. This cost per pixel, primitive, vertex, etc. can be made available to each GPU and used to generate information, or can be included as information. Alternatively, costs can be used when allocating screen space.

一実施形態では、クロスハッチ領域８３０はジオメトリを含まず、ＧＰＵのいずれか１つに割り当てられる可能性がある。別の実施形態では、クロスハッチ領域８３０は、ＧＰＵのいずれにも割り当てられない。いずれの場合も、領域８３０に対してジオメトリレンダリングは実行されない。 In one embodiment, crosshatch region 830 does not include geometry and may be assigned to any one of the GPUs. In another embodiment, crosshatch region 830 is not assigned to any GPU. In either case, no geometry rendering is performed on region 830.

別の実施形態では、オブジェクトに関連付けられたすべての領域が単一のＧＰＵに割り当てられる。このようにして、他のすべてのＧＰＵは、ジオメトリレンダリングを実行するときにオブジェクトを完全にスキップできる。 In another embodiment, all regions associated with an object are assigned to a single GPU. In this way, all other GPUs can completely skip the object when performing geometry rendering.

図９Ａ～９Ｃは、４つのオブジェクトを示す画像フレームのレンダリングについてより詳細な説明を提供する図であり、画像フレームのレンダリングは、レンダリングのＺプレパスフェーズ及びジオメトリフェーズを含む。前述のように、Ｚプレパスフェーズは、本開示の実施形態に従って、画像フレームのジオメトリレンダリングのためにＧＰＵにスクリーン領域を動的に割り当てるために使用される情報を生成するために実行される。説明の目的で、図９Ａ～９Ｃは、一連の画像フレームのそれぞれをレンダリングするための複数のＧＰＵの使用を示す。図９Ａ～９Ｃに示される例に対する４つのＧＰＵの選択は、純粋にマルチＧＰＵレンダリングを説明するために作成されたものであり、様々な実施形態において、マルチＧＰＵレンダリングのために任意の数のＧＰＵを使用できることが理解される。 9A-9C are diagrams providing a more detailed explanation of the rendering of an image frame showing four objects, which includes a Z pre-pass phase and a geometry phase of rendering. As mentioned above, the Z pre-pass phase is performed to generate information used to dynamically allocate screen area to the GPU for geometry rendering of image frames in accordance with embodiments of the present disclosure. For purposes of illustration, FIGS. 9A-9C illustrate the use of multiple GPUs to render each of a series of image frames. The selection of four GPUs for the examples shown in FIGS. 9A-9C was made purely to illustrate multi-GPU rendering; in various embodiments, any number of GPUs for multi-GPU rendering may be used. It is understood that you can use

具体的には、図９Ａは、画像フレーム内に含まれる４つのオブジェクトを示すスクリーン９００Ａを示す。例えば、画像フレームはオブジェクト０、オブジェクト１、オブジェクト２、及びオブジェクト３を含む。示されるように、スクリーン９００Ａは複数の領域に分割される。例えば、スクリーン９００Ａは、４つを超える領域に分割されてもよく、その各々は、現在の画像フレームをレンダリングするための対応するＧＰＵに割り当てられる。 Specifically, FIG. 9A shows a screen 900A showing four objects contained within an image frame. For example, an image frame includes object 0, object 1, object 2, and object 3. As shown, screen 900A is divided into multiple regions. For example, screen 900A may be divided into more than four regions, each of which is assigned a corresponding GPU for rendering the current image frame.

一実施形態では、対応する画像フレームをレンダリングするために、単一のコマンドバッファが複数のＧＰＵによって使用される。共通レンダリングコマンドバッファには、レンダリングのＺプレパスフェーズを実行するための各オブジェクトのドローコールと状態設定が含まれ得る。すべてのＧＰＵがレンダリングのジオメトリパスフェーズを同時に開始するように、コマンドバッファ内にシンク（例えば、同期）操作を含めることができる。コマンドバッファには、レンダリングのジオメトリパスフェーズを実行するための各オブジェクトのドローコールと状態セットが含まれ得る。 In one embodiment, a single command buffer is used by multiple GPUs to render corresponding image frames. A common rendering command buffer may contain draw calls and state settings for each object to perform the Z prepass phase of rendering. A sink (eg, synchronization) operation can be included in the command buffer so that all GPUs start the geometry pass phase of rendering at the same time. The command buffer may contain draw calls and state sets for each object to perform the geometry pass phase of rendering.

一実施形態では、共通レンダリングコマンドバッファは、コマンドが１つのＧＰＵによって実行されるが別のＧＰＵによって実行されない機能をサポートする。すなわち、共通レンダリングコマンドバッファのフォーマットは、複数のＧＰＵの１つまたはサブセットによってコマンドが実行されることを可能にする。例えば、前述のように、レンダリングコマンドバッファ内の描画コマンドまたは述語のフラグにより、単一のＧＰＵが、他のＧＰＵからの干渉を受けることなく、対応するコマンドバッファ内の１つまたは複数のコマンドを実行できる。 In one embodiment, the common rendering command buffer supports the ability for commands to be executed by one GPU but not another GPU. That is, the common rendering command buffer format allows commands to be executed by one or a subset of multiple GPUs. For example, as mentioned above, a flag in a drawing command or predicate in a rendering command buffer allows a single GPU to execute one or more commands in the corresponding command buffer without interference from other GPUs. Can be executed.

図９Ｂは、本開示の一実施形態による、１つまたは複数のＺバッファと、特定の画像フレームのジオメトリのピース及び描画されたスクリーンのスクリーン領域及び／またはサブ領域のそれぞれに関連する情報とを生成するために実行される、レンダリングのＺプレパスフェーズを示す。図９ＢのレンダリングのＺプレパスフェーズにおいて、複数のＧＰＵが連携してレンダリングのフレーム用の１つまたは複数のＺバッファを生成できる１つの戦略が示されている。１つまたは複数のＺバッファを生成するために、他の戦略を実装することができる。 FIG. 9B illustrates one or more Z-buffers and information associated with each of the pieces of geometry of a particular image frame and the screen regions and/or sub-regions of the rendered screen, according to an embodiment of the present disclosure. 3 shows the Z pre-pass phase of rendering performed to generate. In the Z pre-pass phase of rendering in FIG. 9B, one strategy is shown in which multiple GPUs can work together to generate one or more Z buffers for a frame of rendering. Other strategies can be implemented to generate one or more Z-buffers.

示されているように、マルチＧＰＵアーキテクチャの各ＧＰＵにはジオメトリの一部が割り振られる。説明のために、ＧＰＵ－Ａはオブジェクト０に割り当てられ、ＧＰＵ－Ｂはオブジェクト１に割り当てられ、ＧＰＵ－Ｃはオブジェクト２に割り当てられ、ＧＰＵ－Ｄはオブジェクト３に割り当てられている。各ＧＰＵは対応するオブジェクトをＺプレパスフェーズでレンダリングし、対応するオブジェクトをＺバッファのその独自のコピーにレンダリングする。例えば、Ｚプレパスフェーズでは、ＧＰＵ－Ａはオブジェクト０をそのＺバッファにレンダリングする。スクリーン９２１は、ＧＰＵ－Ａによって決定され、その対応するＺバッファに格納されるオブジェクト０のピクセルカバレッジを示している。また、ＧＰＵ－Ｂは、ＧＰＵ－Ｂによって決定され、対応するＺバッファに格納されたオブジェクト１のピクセルカバレッジをスクリーン９２２が示すように、オブジェクト１をそのＺバッファにレンダリングする。加えて、ＧＰＵ－Ｃは、ＧＰＵ－Ｃによって決定され、対応するＺバッファに格納されたオブジェクト２のピクセルカバレッジをスクリーン９２３が示すように、オブジェクト２をそのＺバッファにレンダリングする。さらに、ＧＰＵ－Ｄは、ＧＰＵ－Ｄによって決定され、対応するＺバッファに格納されたオブジェクト３のピクセルカバレッジをスクリーン９２４が示すように、オブジェクト３をそのＺバッファにレンダリングする。 As shown, each GPU in a multi-GPU architecture is allocated a portion of the geometry. For purposes of illustration, GPU-A is assigned to object 0, GPU-B is assigned to object 1, GPU-C is assigned to object 2, and GPU-D is assigned to object 3. Each GPU renders the corresponding object in the Z prepass phase and renders the corresponding object into its own copy of the Z buffer. For example, in the Z pre-pass phase, GPU-A renders object 0 into its Z buffer. Screen 921 shows the pixel coverage of object 0 as determined by GPU-A and stored in its corresponding Z-buffer. GPU-B also renders Object 1 into its Z-buffer, as screen 922 shows the pixel coverage of Object 1 as determined by GPU-B and stored in the corresponding Z-buffer. In addition, GPU-C renders Object 2 into its Z-buffer as screen 923 shows the pixel coverage of Object 2 determined by GPU-C and stored in the corresponding Z-buffer. Additionally, GPU-D renders object 3 into its Z-buffer as screen 924 shows the pixel coverage of object 3 determined by GPU-D and stored in the corresponding Z-buffer.

その後、ＧＰＵに対応する４つのＺバッファコピーがマージされる。つまり、各ＧＰＵは、その独自のＲＡＭ（ランダムアクセスメモリ）に対応するＺバッファのコピーを有する。一実施形態では、１つまたは複数のＺバッファを構築する戦略は、各ＧＰＵにその完成したＺバッファを他のＧＰＵに送信させることを含む。このように、Ｚバッファのそれぞれは、サイズとフォーマットが類似している必要がある。具体的には、Ｚバッファのそれぞれのデータは、Ｚバッファのそれぞれをマージ及び更新するためにすべてのＧＰＵに送信され、これは、４つのオブジェクト１～４のそれぞれのピクセルカバレッジを示すスクリーン９２５によって示され、ＧＰＵの更新されたＺバッファのそれぞれに格納される。オブジェクトは、図９Ｂでは空白であり、これは、Ｚのみが書き込まれており、他の値（例えば、色）がスクリーンのピクセルのそれぞれについて計算されていないことを表す。 The four Z-buffer copies corresponding to the GPUs are then merged. That is, each GPU has a corresponding copy of the Z-buffer in its own RAM (Random Access Memory). In one embodiment, the strategy for building one or more Z-buffers includes having each GPU send its completed Z-buffer to other GPUs. Thus, each of the Z-buffers must be similar in size and format. Specifically, data for each of the Z-buffers is sent to all GPUs to merge and update each of the Z-buffers, which is illustrated by screen 925 showing the pixel coverage of each of the four objects 1-4. and stored in each of the GPU's updated Z-buffers. The object is blank in FIG. 9B, indicating that only Z has been written and no other values (eg, color) have been calculated for each of the pixels of the screen.

別の実施形態では、マージ時間が短縮される。データが他のＧＰＵに送信される前に、対応するＧＰＵによって各Ｚバッファが完全に完了するのを待つ代わりに、各ＧＰＵが対応するジオメトリのピースをそのＺバッファに書き込むときに、対応するＧＰＵは更新されたスクリーン領域のＺバッファデータを他のＧＰＵに送信する。すなわち、第１のＧＰＵがジオメトリを対応するＺバッファまたは他のレンダーターゲットにレンダリングすると、第１のＧＰＵはＺバッファからのデータまたは更新されたスクリーン領域を含む他のレンダーターゲットデータを他のＧＰＵに送信する。送信前に、対応するＧＰＵの各Ｚバッファが完全に書き込まれるのを待たないことで、Ｚバッファのマージに必要な時間の一部が取り除かれ、それによりマージ時間が短縮される。 In another embodiment, merge time is reduced. Instead of waiting for each Z-buffer to be fully completed by the corresponding GPU before the data is sent to the other GPU, as each GPU writes the corresponding piece of geometry to its Z-buffer, sends the updated screen area Z-buffer data to other GPUs. That is, when a first GPU renders geometry to a corresponding Z-buffer or other render target, the first GPU renders data from the Z-buffer or other render target data, including updated screen area, to the other GPU. Send. By not waiting for each Z-buffer of the corresponding GPU to be completely written before transmitting, some of the time required to merge Z-buffers is removed, thereby reducing merge time.

別の実施形態では、Ｚバッファを構築するための別の戦略は、複数のＧＰＵ間で共通のＺバッファまたは共通のレンダーターゲットを共有することを含む。例えば、Ｚバッファリングを実行するために使用されるハードウェアは、各ＧＰＵによって共有及び更新される共通のＺバッファまたは共通のレンダーターゲットが存在するように構成され得る。つまり、各ＧＰＵは、レンダリングのＺプレパスフェーズで１つまたは複数の対応するジオメトリのピースをレンダリングしながら、共通のＺバッファを更新する。４つのＧＰＵアーキテクチャの例では、第１のＧＰＵは、それぞれが複数のＧＰＵによって共有される共通のＺバッファまたは共通のレンダーターゲットを更新することによって、対応するＺバッファまたは他のレンダーターゲットにジオメトリをレンダリングする。共通のＺバッファまたは共通のレンダーターゲットを使用すると、マージステップが不要になる。一実施形態では、スクリーン領域がＧＰＵに割り振られ、共通のＺバッファにアクセスするときの調停の必要性を簡素化する。 In another embodiment, another strategy for building Z-buffers includes sharing a common Z-buffer or a common render target among multiple GPUs. For example, the hardware used to perform Z-buffering may be configured such that there is a common Z-buffer or a common render target that is shared and updated by each GPU. That is, each GPU updates a common Z-buffer while rendering one or more corresponding pieces of geometry during the Z-prepass phase of rendering. In the example four-GPU architecture, a first GPU can each update a common Z-buffer or common render target shared by multiple GPUs, thereby adding geometry to a corresponding Z-buffer or other render target. Render. Using a common Z-buffer or a common render target eliminates the need for a merge step. In one embodiment, screen space is allocated to the GPU, simplifying the need for arbitration when accessing a common Z-buffer.

前述のように、Ｚバッファのレンダリング中に情報が生成される。一実施形態では、図４のラスタ化ステージ４２０の一部として実行するスキャンコンバータが情報を生成する。例えば、スキャンコンバータは、ジオメトリのピースとスクリーン領域のそれぞれとのオーバーラップ面積を計算することができる。様々な実施形態では、オーバーラップは、ジオメトリのピースの各プリミティブと各スクリーン領域との間など、ピクセル単位で測定することができる。さらに、スキャンコンバータは、領域ごとに測定されたように、オーバーラップの面積を合計して、ジオメトリのピースごとに（例えば、ピクセルごとに）オーバーラップの総面積を作成することができる。 As mentioned above, information is generated during Z-buffer rendering. In one embodiment, a scan converter running as part of rasterization stage 420 of FIG. 4 generates the information. For example, the scan converter can calculate the area of overlap between each piece of geometry and the screen area. In various embodiments, overlap may be measured in pixels, such as between each primitive of a piece of geometry and each screen area. Additionally, the scan converter can sum the areas of overlap as measured for each region to create a total area of overlap for each piece of geometry (eg, pixel by pixel).

ジオメトリパスの開始前に、この情報を使用してスクリーン領域をＧＰＵに割り当てることができる。すなわち、複数のＧＰＵのうちの１つまたは複数をスクリーン領域に割り当てることができる。一実施形態では、割り当ては、各ＧＰＵのレンダリングレスポンシビリティ（例えばレンダリングジオメトリ）がほぼ等しくなるように行われる。このように、レンダリングの１つのフェーズ（Ｚプレパスフェーズ）で生成された情報は、レンダリングのジオメトリパスフェーズに対してスクリーン領域をＧＰＵに割り当てるなど、レンダリングの別のフェーズで使用される。 This information can be used to allocate screen space to the GPU before the geometry pass begins. That is, one or more of the multiple GPUs can be assigned to the screen area. In one embodiment, the allocation is made such that each GPU's rendering responsiveness (eg, rendering geometry) is approximately equal. In this way, information generated in one phase of rendering (the Z pre-pass phase) is used in another phase of rendering, such as allocating screen space to the GPU for the geometry pass phase of rendering.

前述のように、オブジェクトは他のオブジェクトとは異なるレンダリングコストを有し得る。つまり、１つのオブジェクトのピクセル、またはプリミティブ、または頂点あたりのコストは、他のオブジェクトより高いことも低いこともある。いくつかの実施形態では、ピクセル／プリミティブ／頂点当たりのコストがＧＰＵで利用可能であり、情報の生成に使用され、及び／または情報の中に含まれている。別の実施形態では、ピクセル／プリミティブ／頂点当たりのコストは、スクリーン領域をＧＰＵに割り当てるときに使用され、これにより、生成される情報は、ピクセル、プリミティブ、または頂点ごとの対応するジオメトリのピースの概算レンダリングコストを考慮に入れる。すなわち、複数のコストが、レンダリングのジオメトリフェーズ中に画像フレームの複数のジオメトリのピースをレンダリングするために決定される。ジオメトリレンダリングのためにスクリーン領域をＧＰＵに割り当てるとき、コストが考慮される。例えば、複数のＧＰＵへのスクリーン領域のその後の割り当てでは、ＧＰＵをレンダリングのコストがＧＰＵ間で必要に応じて（均等または不均等に）分割される方法でスクリーン領域に割り当てることができるように、ピクセル、プリミティブ、または頂点ごとのジオメトリのピースの概算のレンダリングコストを考慮に入れる。 As mentioned above, objects may have different rendering costs than other objects. That is, one object may have a higher or lower cost per pixel, or primitive, or vertex than another object. In some embodiments, the cost per pixel/primitive/vertex is available to the GPU, used to generate the information, and/or included in the information. In another embodiment, the cost per pixel/primitive/vertex is used when allocating screen area to the GPU, so that the information generated is Take into account approximate rendering costs. That is, costs are determined for rendering pieces of geometry of an image frame during the geometry phase of rendering. Cost is considered when allocating screen space to the GPU for geometry rendering. For example, in the subsequent allocation of screen space to multiple GPUs, GPUs can be allocated to screen space in such a way that the cost of rendering is divided (equally or unevenly) between the GPUs as desired. Takes into account the approximate rendering cost of a piece of geometry per pixel, primitive, or vertex.

図９Ｃは、本開示の一実施形態による、特定の画像フレームのジオメトリのピースをレンダリングするために実行されるレンダリングのジオメトリパスフェーズを示す。ジオメトリパスフェーズでは、各ＧＰＵは、特定の画像フレームのオブジェクトを、それがレスポンシビリティを有するスクリーン領域にレンダリングする（例えば、スクリーン領域へのＧＰＵの以前の割り当てに基づいて）。具体的には、各ＧＰＵはすべてのオブジェクトをレンダリングするが、これらのオブジェクトとジオメトリレンダリングのためにＧＰＵに割り当てられたスクリーン領域との間にオーバーラップがないことが（情報に基づいて）わかっているオブジェクトは除く。そのため、ジオメトリのピースが特定のＧＰＵに割り当てられたスクリーン領域にオーバーラップしない場合、そのＧＰＵはそのジオメトリのピースのレンダリングをスキップできる。 FIG. 9C illustrates the geometry pass phase of rendering performed to render a piece of geometry for a particular image frame, according to one embodiment of the present disclosure. In the geometry pass phase, each GPU renders objects of a particular image frame to screen areas for which it has responsiveness (e.g., based on the GPU's previous assignment to screen areas). Specifically, each GPU renders all objects, but we know (based on the information) that there is no overlap between these objects and the screen area allocated to the GPU for geometry rendering. Excludes objects that are present. Thus, if a piece of geometry does not overlap the screen area allocated to a particular GPU, that GPU can skip rendering that piece of geometry.

示されているように、マルチＧＰＵアーキテクチャの各ＧＰＵは、スクリーンの一部に割り当てまたは割り振られる。説明のために、ＧＰＵ－Ａは９３１Ａとラベル付けされた１つの領域に割り当てられ、（図９Ａで紹介されたように）オブジェクト０をレンダリングする（ここでは、色データなどの他の値が書き込まれていることを表すために薄暗くされている）。スクリーン９３１は、ジオメトリレンダリング後のオブジェクト０のレンダーターゲットデータ（例えばピクセル）を示している。また、ＧＰＵ－Ｂは９３２Ａとラベル付けされた２つの領域に割り当てられ、オブジェクト１及びオブジェクト２の部分（薄暗くされたそれらのオブジェクトのそれぞれの部分）をレンダリングする。スクリーン９３２は、ジオメトリレンダリング後のオブジェクト１及び２のそれぞれの部分のレンダーターゲットデータ（例えばピクセル）を示す。さらに、ＧＰＵ－Ｃは９３３Ａとラベル付けされた２つの領域に割り当てられ、オブジェクト２の部分（薄暗くされたそれぞれの部分）をレンダリングする。スクリーン９３３は、ジオメトリレンダリング後のオブジェクト２のそれぞれの部分のレンダーターゲットデータ（例えばピクセル）を示す。また、ＧＰＵ－Ｄは９３４Ａとラベル付けされた３つの領域に割り当てられ、オブジェクト３をレンダリングする（ここでは、色データなどの他の値が書き込まれていることを表すために薄暗くされている）。スクリーン９３４は、ジオメトリレンダリング後のオブジェクト３のレンダーターゲットデータ（例えばピクセル）を示している。 As shown, each GPU in the multi-GPU architecture is assigned or allocated to a portion of the screen. For illustration purposes, GPU-A is assigned to one region labeled 931A and renders object 0 (as introduced in Figure 9A) (where other values such as color data are written). (dimmed to show that Screen 931 shows render target data (eg, pixels) for object 0 after geometry rendering. GPU-B is also assigned to two regions labeled 932A to render portions of object 1 and object 2 (the respective portions of those objects are dimmed). Screen 932 shows render target data (eg, pixels) for respective portions of objects 1 and 2 after geometry rendering. Furthermore, GPU-C is assigned to two regions labeled 933A to render portions of object 2 (respective portions that are dimmed). Screen 933 shows render target data (eg, pixels) for each portion of object 2 after geometry rendering. GPU-D is also assigned to three areas labeled 934A to render object 3 (here dimmed to represent other values being written, such as color data) . Screen 934 shows render target data (eg, pixels) for object 3 after geometry rendering.

ジオメトリのレンダリング後、各ＧＰＵによって生成されたレンダーターゲットデータをマージする必要があり得る。例えば、各ＧＰＵのレンダリングのジオメトリパスフェーズ中に生成されたジオメトリデータのマージが実行され、これは、４つのオブジェクト０～３すべてのレンダーターゲットデータ（例えば、ピクセル）を含むスクリーン９３５によって示される。 After rendering the geometry, it may be necessary to merge the render target data produced by each GPU. For example, merging of the geometry data generated during the geometry pass phase of each GPU's rendering is performed, as indicated by screen 935 containing render target data (eg, pixels) for all four objects 0-3.

一実施形態では、スクリーン領域のＧＰＵへの割り当ては、フレームごとに変化する。つまり、各ＧＰＵは、２つの連続する画像フレームの割り当てを比較するときに、異なるスクリーン領域のレスポンシビリティを有する場合がある。別の実施形態では、ＧＰＵへのスクリーン領域の割り当ても、単一のフレームをレンダリングする際に使用される様々なフェーズを通じて変化し得る。すなわち、スクリーン領域の割り当ては、ジオメトリ解析フェーズ（例えば、Ｚプレパス）またはジオメトリパスフェーズなどのレンダリングフェーズ中に動的に変化する場合がある。 In one embodiment, the allocation of screen area to the GPU changes from frame to frame. That is, each GPU may have a different screen area responsiveness when comparing allocations of two consecutive image frames. In another embodiment, the allocation of screen area to the GPU may also vary throughout the various phases used in rendering a single frame. That is, the screen area allocation may change dynamically during a rendering phase, such as a geometry analysis phase (eg, Z pre-pass) or a geometry pass phase.

例えば、ジオメトリフェーズの割り当てが行われるとき、この割り当てはそのため既存の割り当てと異なる場合がある。つまり、以前はＧＰＵ－Ｂがレスポンシビリティをもっていたスクリーン領域を今はＧＰＵ－Ａがレスポンシビリティをもつ可能性がある。これにより、ＧＰＵ－ＢのメモリからＧＰＵＡのメモリへのＺバッファまたはその他のレンダーターゲットデータの転送が必要になる場合がある。一例として、情報は、スクリーン領域に書き込むコマンドバッファ内の第１のオブジェクトを含み得る。この情報を使用して、あるＧＰＵから別のＧＰＵにスクリーン領域のＺバッファデータまたはその他のレンダーターゲットデータを転送するなど、ＤＭＡ（ダイレクトメモリアクセス）転送をスケジュールすることができる。上記の例に従って、ＧＰＵ－Ｂのメモリからのデータ（例えば、Ｚバッファまたはレンダーターゲットデータ）は、ＧＰＵ－Ａのメモリに転送され得る。場合によっては、画像フレームのレンダリング時に最初のスクリーン使用が発生するのが遅いほど、ＤＭＡ転送の時間が長くなる。 For example, when a geometry phase assignment is made, this assignment may therefore differ from the existing assignment. In other words, GPU-A may now have responsiveness in a screen area where GPU-B previously had responsiveness. This may require the transfer of Z-buffers or other render target data from GPU-B's memory to GPU A's memory. As an example, the information may include a first object in a command buffer to write to a screen area. This information can be used to schedule DMA (direct memory access) transfers, such as transferring screen area Z-buffer data or other render target data from one GPU to another. Following the example above, data from GPU-B's memory (eg, Z-buffer or render target data) may be transferred to GPU-A's memory. In some cases, the later the first screen use occurs when rendering an image frame, the longer the DMA transfer takes.

別の実施形態では、ＧＰＵ間のＺバッファまたは他のレンダーターゲットデータのすべての更新が完了すると、情報は、スクリーン領域に書き込むコマンドバッファ内の最後のオブジェクトを含み得る。その情報を使用して、レンダリングＧＰＵ（レンダリングのＺプレパスフェーズ中に実行）から他のＧＰＵへのＤＭＡ転送をスケジュールすることができる。つまり、この情報は、あるＧＰＵから別のＧＰＵ（例えば、レンダリングＧＰＵ）へのスクリーン領域のＺバッファまたはその他のレンダーターゲットデータの転送をスケジュールするために使用される。 In another embodiment, once all updates of the Z-buffer or other render target data between GPUs are complete, the information may include the last object in the command buffer to write to the screen area. That information can be used to schedule DMA transfers from the rendering GPU (performed during the Z pre-pass phase of rendering) to other GPUs. That is, this information is used to schedule the transfer of screen area Z-buffers or other render target data from one GPU to another (eg, a rendering GPU).

さらに別の実施形態では、ＧＰＵ間のＺバッファまたは他のレンダーターゲットデータのすべての更新が完了すると、更新されたデータをＧＰＵにブロードキャストすることができる。その場合、更新されたデータは、ＧＰＵのいずれかがそのデータを必要とする場合に利用できる。別の実施形態では、受信ＧＰＵがレンダリングの後続のフェーズでスクリーン領域のレスポンシビリティを有することを見越すなどして、データが特定のＧＰＵに送信される。 In yet another embodiment, once all updates of Z-buffers or other render target data between GPUs are complete, the updated data may be broadcast to the GPUs. In that case, the updated data is available if any of the GPUs need it. In another embodiment, data is sent to a particular GPU, such as in anticipation that the receiving GPU will have screen area responsiveness in subsequent phases of rendering.

図１０は、本開示の一実施形態による、ジオメトリレンダリングのために、オブジェクト全体またはオブジェクトの一部に基づいたスクリーン領域のＧＰＵへの動的割り当てを使用した画像フレームのレンダリングを示しており、割り当ては、画像フレームをレンダリングしている間に実行されるレンダリングのＺプレパスフェーズ中に実行された現在の画像フレームのジオメトリの解析に基づく。具体的には、レンダリングタイミング図１０００Ａは、オブジェクト全体（すなわち、個々のドローコールによって使用または生成されたジオメトリ）に基づく画像フレームのレンダリングを示している。対照的に、レンダリングタイミング図１０００Ｂは、オブジェクトの部分に基づく画像フレームのレンダリングを示す。オブジェクトの部分に基づいて画像フレームをレンダリングするときに示される利点には、ＧＰＵ間のレンダリングパフォーマンスのバランスが向上し、したがって画像フレームのレンダリング時間が短縮されることが含まれる。 FIG. 10 illustrates rendering of an image frame using dynamic allocation of screen space to the GPU based on the entire object or portions of the object for geometry rendering, according to an embodiment of the present disclosure; is based on an analysis of the geometry of the current image frame performed during the Z pre-pass phase of rendering performed while rendering the image frame. Specifically, rendering timing diagram 1000A illustrates the rendering of an image frame based on the entire object (ie, the geometry used or generated by the individual draw calls). In contrast, rendering timing diagram 1000B depicts rendering of an image frame based on portions of an object. Advantages exhibited when rendering image frames based on portions of objects include better balancing of rendering performance between GPUs, thus reducing image frame rendering time.

具体的には、レンダリングタイミング図１０００Ａは、４つのＧＰＵ（例えば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）による４つのオブジェクト０～３のそれぞれのレンダリングを示し、レンダリングのレスポンシビリティはオブジェクトの粒度でＧＰＵ間に分散される。オブジェクト０～３は、図９Ａ～９Ｃで以前に紹介されたものである。レンダリングの様々なフェーズが、タイムライン１０９０に関連して示されている。垂直線１００１Ａは、Ｚプレパスのレンダリングの開始を示す。レンダリングタイミング図１０００Ａは、レンダリングのＺプレパスフェーズ１０１０Ａを含み、ＧＰＵ間のＺバッファデータのマージを示すフェーズ１０２０Ａも示す。ＧＰＵのアイドル時間は、ハッシュアウトされた面積を使用して示され、マージフェーズ１０２０Ａは、このアイドル時間中に発生する可能性がある。シンクポイント１０３０Ａは、各ＧＰＵがそれぞれのジオメトリパスレンダリングフェーズを同時に開始するように提供される。また、レンダリングタイミング図１０００Ａは、前述のように、画像フレームのジオメトリをレンダリングするためのレンダリングのジオメトリパスフェーズ１０４０Ａを含む。シンクポイント１０５０Ａは、各ＧＰＵが同時に次の画像フレームのレンダリングを開始するように提供される。シンクポイント１０５０Ａはまた、対応する画像フレームのレンダリングの終了を示し得る。オブジェクト全体をレンダリングするときの画像フレームのレンダリングの合計時間は、期間１０７０で示される。各ＧＰＵのスクリーン領域レスポンシビリティを決定するための情報の処理は、図には示されていないが、ジオメトリパス１０３０Ａの開始前に完了すると推定され得る。 Specifically, rendering timing diagram 1000A shows the rendering of each of four objects 0-3 by four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D), and shows the rendering timing diagram 1000A. Responsibility is distributed across GPUs at object granularity. Objects 0-3 were previously introduced in FIGS. 9A-9C. Various phases of rendering are shown in relation to timeline 1090. Vertical line 1001A indicates the start of rendering of the Z pre-pass. Rendering timing diagram 1000A includes a Z pre-pass phase 1010A of rendering and also shows a phase 1020A that illustrates merging of Z buffer data between GPUs. GPU idle time is indicated using the hashed out area, and the merge phase 1020A may occur during this idle time. A sync point 1030A is provided such that each GPU starts its respective geometry pass rendering phase simultaneously. The rendering timing diagram 1000A also includes a rendering geometry pass phase 1040A for rendering the geometry of the image frame, as described above. A sync point 1050A is provided such that each GPU simultaneously begins rendering the next image frame. Sync point 1050A may also indicate the end of rendering of the corresponding image frame. The total time for rendering an image frame when rendering the entire object is indicated by period 1070. The processing of information to determine each GPU's screen area responsiveness is not shown in the diagram, but may be assumed to be completed before the beginning of geometry pass 1030A.

示されるように、ジオメトリパスフェーズ１０４０Ａ中のレンダリングタイミング図１０００Ａのハッシュされた面積は、ＧＰＵアイドル時間を示す。例えば、ＧＰＵ－Ａは、ＧＰＵ－Ａがレンダリングに費やす時間とほぼ同じ時間アイドル状態になる。一方、ＧＰＵ－Ｂはアイドル状態になる時間がほとんどなく、ＧＰＵ－Ｃがアイドル状態になる時間はない。 As shown, the hashed area of rendering timing diagram 1000A during geometry pass phase 1040A indicates GPU idle time. For example, GPU-A will be idle for approximately the same amount of time that GPU-A spends rendering. On the other hand, GPU-B has almost no time to be in an idle state, and GPU-C has no time to be in an idle state.

対照的に、レンダリングタイミング図１０００Ｂは、４つのＧＰＵ（例えば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）による４つのオブジェクト０～３のそれぞれのレンダリングを示し、レンダリングのレスポンシビリティはＧＰＵ間で、オブジェクト全体ではなく、図６Ｂに示されるジオメトリのピースなどのオブジェクトの部分の粒度で分散される。例えば、オブジェクト全体ではなくジオメトリのピース（例えば、オブジェクトの部分）について情報（例えば、スクリーン領域とのオーバーラップ）が生成される。このようにして、ドローコールによって使用または生成される画像フレームのジオメトリ（例えば、オブジェクト全体）は、ジオメトリのより小さなピースに再分割され、生成される情報は、これらのジオメトリのピースに関するものである。いくつかの場合では、ジオメトリのピースを再分割できる程度には制限がある。 In contrast, rendering timing diagram 1000B shows the rendering of each of four objects 0-3 by four GPUs (e.g., GPU-A, GPU-B, GPU-C, and GPU-D) and shows the rendering response. Civility is distributed across GPUs at the granularity of parts of an object, such as the pieces of geometry shown in Figure 6B, rather than the entire object. For example, information (eg, overlap with screen area) is generated for a piece of geometry (eg, a portion of an object) rather than the entire object. In this way, the geometry of the image frame used or produced by the draw call (e.g. the entire object) is subdivided into smaller pieces of geometry, and the information produced is about these pieces of geometry. . In some cases, there are limits to the extent to which pieces of geometry can be subdivided.

レンダリングの様々なフェーズが、タイムライン１０９０に関連して示されている。垂直線１００１Ｂは、Ｚプレパスのレンダリングの開始を示す。レンダリングタイミング図１０００Ｂは、レンダリングのＺプレパスフェーズ１０１０Ｂを含み、ＧＰＵ間でＺバッファデータのマージが実行されるハッシュアウトされた期間１０２０Ｂも示す。レンダリングタイミング図１０００ＢのＧＰＵアイドル時間１０２０Ｂは、レンダリングタイミング図１０００Ａのアイドル時間１０２０Ａより短い。示されているように、各ＧＰＵはほぼ同じ時間をＺプレパスフェーズの処理に費やしており、アイドル時間はほとんどまたはまったくない。シンクポイント１０３０Ｂは、各ＧＰＵがそれぞれのジオメトリパスレンダリングフェーズを同時に開始するように提供される。また、レンダリングタイミング図１０００Ｂは、前述のように、画像フレームのジオメトリをレンダリングするためのレンダリングのジオメトリパスフェーズ１０４０Ｂを含む。シンクポイント１０５０Ｂは、各ＧＰＵが同時に次の画像フレームのレンダリングを開始するように提供される。シンクポイント１０５０Ｂはまた、対応する画像フレームのレンダリングの終了を示し得る。示されているように、各ＧＰＵはほぼ同じ時間をジオメトリパスフェーズの処理に費やしており、アイドル時間はほとんどまたはまったくない。つまり、Ｚプレパスレンダリングとジオメトリレンダリングは、それぞれＧＰＵ間でほぼバランスが取れている。また、オブジェクト全体の部分によってレンダリングするときの画像フレームのレンダリングの合計時間は、期間１０７５で示される。各ＧＰＵのスクリーン領域レスポンシビリティを決定するための情報の処理は、図には示されていないが、ジオメトリパス１０３０Ｂの開始前に完了すると推定され得る。 Various phases of rendering are shown in relation to timeline 1090. Vertical line 1001B indicates the start of rendering of the Z pre-pass. Rendering timing diagram 1000B includes a Z pre-pass phase of rendering 1010B and also shows a hashed out period 1020B during which merging of Z buffer data between GPUs is performed. GPU idle time 1020B in rendering timing diagram 1000B is shorter than idle time 1020A in rendering timing diagram 1000A. As shown, each GPU spends approximately the same amount of time processing the Z pre-pass phase, with little or no idle time. A sync point 1030B is provided so that each GPU starts its respective geometry pass rendering phase at the same time. Rendering timing diagram 1000B also includes a rendering geometry pass phase 1040B for rendering the geometry of the image frame, as described above. A sync point 1050B is provided such that each GPU simultaneously begins rendering the next image frame. Sync point 1050B may also indicate the end of rendering of the corresponding image frame. As shown, each GPU spends approximately the same amount of time processing the geometry pass phase, with little or no idle time. In other words, Z pre-pass rendering and geometry rendering are almost balanced between GPUs. Also, the total time for rendering an image frame when rendering by portion of the entire object is indicated by period 1075. The processing of information to determine each GPU's screen area responsiveness is not shown in the diagram, but may be assumed to be completed before the beginning of geometry pass 1030B.

示されるように、レンダリングタイミング図１０００Ｂは、オブジェクト全体ではなくオブジェクトの部分の粒度でレンダリングレスポンシビリティがＧＰＵ間で分散されるときの短縮されたレンダリング時間を示す。例えば、オブジェクトの部分の粒度で画像フレームをレンダリングするときの時間の節約１０７７が示される。 As shown, rendering timing diagram 1000B illustrates reduced rendering time when rendering responsiveness is distributed across GPUs at the granularity of portions of objects rather than entire objects. For example, time savings 1077 are shown when rendering an image frame at the granularity of a portion of an object.

加えて、本開示の一実施形態によれば、この情報により、レンダリングフェーズの要件及び／または依存関係を緩和でき、これにより、別のＧＰＵがレンダリングの現在のフェーズをまだ処理している間に、ＧＰＵがレンダリングの後続のフェーズに進む結果となる。例えば、任意のＧＰＵがジオメトリフェーズ１０４０Ａまたは１０４０Ｂを開始する前に、すべてのＧＰＵについてＺプレパスフェーズ１０２０Ａまたは１０２０Ｂが完了しなければならない、という１つの要件は緩和され得る。示されるように、レンダリングタイミング図１０００Ａは、ジオメトリフェーズ１０４０Ａを開始する前に、すべてのＧＰＵのシンクポイント１０２０Ａを含む。しかしながら、この情報は、（例えば）ＧＰＵＡが、他のＧＰＵが対応するレンダリングのＺプレパスフェーズを完了する前に、その割り当てられた領域のレンダリングを開始できることを示し得る。これにより、画像フレームのレンダリング時間が全体的に短縮される場合がある。 In addition, according to one embodiment of the present disclosure, this information allows the requirements and/or dependencies of the rendering phase to be relaxed, thereby allowing the rendering phase to be completed while another GPU is still processing the current phase of rendering. , resulting in the GPU proceeding to subsequent phases of rendering. For example, one requirement that the Z pre-pass phase 1020A or 1020B must be completed for all GPUs before any GPU begins the geometry phase 1040A or 1040B may be relaxed. As shown, the rendering timing diagram 1000A includes all GPU sync points 1020A before starting the geometry phase 1040A. However, this information may indicate that (for example) GPU A can begin rendering its assigned region before other GPUs complete the Z pre-pass phase of the corresponding rendering. This may reduce the overall image frame rendering time.

図１１は、本開示の一実施形態による、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的な割当てに使用される情報を生成するために、レンダリングのＺプレパスフェーズを実行するために、画像フレームのジオメトリのピースへのＧＰＵ割り当てをインターリーブすることを示す図である。即ち、図１１は、Ｚプレパスに対する複数のＧＰＵ間のレンダリングレスポンシビリティの分散を示す。前述のように、各ＧＰＵは画像フレームのジオメトリの対応する部分に割り当てられ、その部分はさらにオブジェクト、オブジェクトの部分、ジオメトリ、ジオメトリのピースなどに分割され得る。 FIG. 11 illustrates a diagram for performing a Z pre-pass phase of rendering to generate information used for dynamic allocation of screen area to GPUs for geometry rendering of image frames, according to an embodiment of the present disclosure. FIG. 2 is a diagram illustrating interleaving GPU assignments to pieces of geometry of an image frame. That is, FIG. 11 shows the distribution of rendering responsiveness among multiple GPUs for Z pre-pass. As mentioned above, each GPU is assigned to a corresponding portion of the image frame's geometry, which portion may be further divided into objects, portions of objects, geometries, pieces of geometry, and so on.

図１１に示すように、オブジェクト０、１、及び２は、個々のドローコールによって使用または生成されたジオメトリを表す。一実施形態では、ＧＰＵは、前述のように、各オブジェクトを、位置キャッシュ及び／またはパラメータキャッシュが割り振られるおおよそのサイズのピースなど、ジオメトリのより小さなピースに分割する。純粋に説明のために、オブジェクト０は、図６Ｂのオブジェクト６１０のように、ピース「ａ」、「ｂ」、「ｃ」、「ｄ」、「ｅ」および「ｆ」に分割される。また、オブジェクト１は、ピース「ｇ」、「ｈ」、及び「ｉ」に分割される。さらに、オブジェクト２はピース「ｊ」、「ｋ」、「ｌ」、「ｍ」、「ｎ」、及び「ｏ」に分割される。ピースは、レンダリングのＺプレパスフェーズを実行するレスポンシビリティを分散するために（例えば、ａ～ｏに）順序付けることができる。 As shown in Figure 11, objects 0, 1, and 2 represent the geometry used or generated by the individual draw calls. In one embodiment, the GPU divides each object into smaller pieces of geometry, such as pieces of approximate size to which position caches and/or parameter caches are allocated, as described above. Purely for illustration purposes, object 0 is divided into pieces "a," "b," "c," "d," "e," and "f," like object 610 in FIG. 6B. Further, object 1 is divided into pieces "g", "h", and "i". Furthermore, object 2 is divided into pieces "j", "k", "l", "m", "n", and "o". Pieces can be ordered (eg, a through o) to distribute the responsivity of performing the Z pre-pass phase of rendering.

分散１１１０（例えば、ＡＢＣＤＡＢＣＤＡＢＣＤ…行）は、複数のＧＰＵ間でジオメトリテストを実行するレスポンシビリティの均等な分散を示している。具体的には、１つのＧＰＵにジオメトリの最初の４分の１を取らせ（例えば、ブロックで、ＧＰＵ－Ａが約１６個の合計ピースのうちの「ａ」、「ｂ」、「ｃ」及び「ｄ」を含む最初の４つのピースをテストのために取る）、２番目のＧＰＵに２番目の４分の１を取らせる、などではなく、ＧＰＵへの割り当てはインターリーブされる。つまり、レンダリングのＺプレパスフェーズを実行するために、連続するジオメトリのピースが異なるＧＰＵに割り当てられる。例えば、ピース「ａ」はＧＰＵ－Ａに割り当てられ、ピース「ｂ」はＧＰＵ－Ｂに割り当てられ、ピース「ｃ」はＧＰＵ－Ｃに割り当てられ、ピース「ｄ」はＧＰＵ－Ｄに割り当てられ、ピース「ｅ」はＧＰＵ－Ａに割り当てられ、ピース「ｆ」はＧＰＵ－Ｂに割り当てられ、ピース「ｇ」はＧＰＵ－Ｃに割り当てられる。結果として、（ＧＰＵ－Ａがジオメトリのピースの最初の４分の１を取得した場合などのように）処理するジオメトリのピースの合計数を知る必要は無く、レンダリングのＺプレパスフェーズの処理はＧＰＵ（例えば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）間でほぼバランスが取れている。 Distribution 1110 (eg, lines ABCDABCDABCD...) indicates an even distribution of the responsibilities of performing geometry tests across multiple GPUs. Specifically, let one GPU take the first quarter of the geometry (e.g. in a block, GPU-A takes "a", "b", "c" out of about 16 total pieces) and "d" for testing), have the second GPU take the second quarter, etc., the allocation to the GPUs is interleaved. That is, successive pieces of geometry are assigned to different GPUs to perform the Z pre-pass phase of rendering. For example, piece "a" is assigned to GPU-A, piece "b" is assigned to GPU-B, piece "c" is assigned to GPU-C, piece "d" is assigned to GPU-D, Piece "e" is assigned to GPU-A, piece "f" is assigned to GPU-B, and piece "g" is assigned to GPU-C. As a result, there is no need to know the total number of pieces of geometry to process (such as when GPU-A has acquired the first quarter of the pieces of geometry), and processing of the Z pre-pass phase of rendering is done by the GPU. (eg, GPU-A, GPU-B, GPU-C, and GPU-D).

他の実施形態では、１つのフレーム（例えば前の画像フレーム）のレンダリング中に生成された情報を使用して、後続のフレーム（例えば現在の画像フレーム）のスクリーン領域にＧＰＵを割り当てることができる。例えば、ハードウェアは、前の画像フレームのレンダリングのジオメトリパスフェーズ中のＧＰＵの使用状況など、前の画像フレームのレンダリングのジオメトリパスフェーズ中に情報を生成するように構成できる。具体的には、この情報には、スクリーン領域ごとのジオメトリのピースごとにシェーディングされる実際のピクセルの数が含まれ得る。この情報は、レンダリングのジオメトリパスのスクリーン領域にＧＰＵを割り振るときに、後続のフレーム（例えば、現在の画像フレームのレンダリング）で使用できる。つまり、現在の画像フレームのレンダリングのジオメトリパスフェーズを実行するためのＧＰＵへのスクリーン領域の割り当てでは、前述のとおり、前の画像フレームから生成された情報と、現在の画像フレーム（もしあれば）のＺプレパスフェーズで生成された情報の両方が考慮される。そのため、スクリーン領域は、前の画像フレームからの情報（例えば、ＧＰＵの使用状況）と、現在の画像フレームのレンダリングのＺプレパスフェーズ中に生成された情報（存在する場合）に基づいて、ＧＰＵに割り当てられる。 In other embodiments, information generated during rendering of one frame (e.g., a previous image frame) may be used to allocate GPUs to screen area for subsequent frames (e.g., the current image frame). For example, the hardware may be configured to generate information during a geometry pass phase of rendering a previous image frame, such as GPU usage during a geometry pass phase of rendering a previous image frame. Specifically, this information may include the actual number of pixels shaded per piece of geometry per screen area. This information can be used in subsequent frames (eg, rendering of the current image frame) when allocating GPUs to the screen area of the rendering geometry pass. That is, the allocation of screen space to the GPU to perform the geometry pass phase of rendering the current image frame uses the information generated from the previous image frame and the current image frame (if any) as described above. Both of the information generated in the Z pre-pass phase of is taken into account. Therefore, screen area is allocated to the GPU based on information from previous image frames (e.g. GPU usage) and information generated during the Z pre-pass phase of rendering of the current image frame (if any). Assigned.

前のフレームからのこの情報は、前述のオーバーラップ面積を使用するだけ（例えば、現在の画像フレームの情報を生成する場合）、またはＺプレパス中にスクリーン領域ごとのジオメトリのピースごとにＺバッファに書き込まれたピクセルの数を使用するだけよりも、精度を高めることができる。例えば、オブジェクトのＺバッファに書き込まれるピクセルの数は、他のオブジェクトによるオブジェクトの閉塞に起因してジオメトリパスでシェーディングする必要があるピクセルの数に対応しない場合がある。前の画像フレームからの情報（例えば、ＧＰＵの使用状況）と、現在の画像フレームのレンダリングのＺプレパスフェーズ中に生成された情報の両方を使用すると、現在の画像フレームのレンダリングのジオメトリパスフェーズ中にレンダリングがより効率的になり得る。 This information from the previous frame can either just use the overlap area mentioned above (e.g. to generate information for the current image frame), or it can be stored in the Z buffer for each piece of geometry per screen area during the Z pre-pass. Accuracy can be increased over just using the number of written pixels. For example, the number of pixels written to an object's Z-buffer may not correspond to the number of pixels that need to be shaded in a geometry pass due to occlusion of the object by other objects. Using both information from the previous image frame (e.g. GPU usage) and information generated during the Z pre-pass phase of rendering the current image frame, during the geometry pass phase of rendering the current image frame rendering can be more efficient.

情報はまた、対応するスクリーン領域にオーバーラップするジオメトリの対応する部分（例えば、ジオメトリのピース）によって使用される頂点の数を与える、各スクリーン領域の頂点数を含むことができる。そのため、後で対応するジオメトリのピースをレンダリングするときに、レンダリングＧＰＵは頂点数を使用して、位置キャッシュとパラメータキャッシュにスペースを割り振ることができる。例えば、一実施形態では、必要とされない頂点には割り振られたスペースがなく、これによりレンダリングの効率を高めることができる。 The information may also include a number of vertices for each screen region, giving the number of vertices used by a corresponding portion of geometry (eg, a piece of geometry) that overlaps the corresponding screen region. Therefore, when rendering the corresponding piece of geometry later, the rendering GPU can use the vertex count to allocate space for the position and parameter caches. For example, in one embodiment, vertices that are not needed do not have space allocated to them, which can increase rendering efficiency.

さらに別の実施形態では、レンダリングのＺプレパスフェーズ中に情報を生成することに関連する処理オーバーヘッド（ソフトウェアまたはハードウェアのいずれか）が存在する場合がある。その場合、ジオメトリの特定のピースについての情報の生成をスキップすることが有益であり得る。つまり、特定のオブジェクトについて情報が生成されて、他のオブジェクトについては生成されなくてもよい。例えば、大きなプリミティブを有し、多数のスクリーン領域にオーバーラップする可能性が高いジオメトリのピース（例えば、オブジェクトまたはオブジェクトの部分）については、情報が生成されなくてもよい。大きなプリミティブを有するオブジェクトは、スカイボックスである場合や、例えば大きな三角形を含む大きな地形のピースである場合がある。その場合、画像フレームのマルチＧＰＵレンダリングに使用される各ＧＰＵは、それらのジオメトリのピースをレンダリングする必要がある可能性が高く、そのことを示す情報は不要である。このように、情報は、対応するジオメトリのピースの特性に応じて、生成されても生成されなくてもよい。 In yet another embodiment, there may be processing overhead (either software or hardware) associated with generating information during the Z pre-pass phase of rendering. In that case, it may be beneficial to skip generating information about particular pieces of geometry. In other words, information may be generated for a specific object and not for other objects. For example, no information may be generated for pieces of geometry (eg, objects or portions of objects) that have large primitives and are likely to overlap many screen areas. Objects with large primitives may be skyboxes or large terrain pieces containing, for example, large triangles. In that case, each GPU used for multi-GPU rendering of an image frame would likely need to render those pieces of geometry, and no information would be needed to indicate that. In this way, information may or may not be generated depending on the properties of the corresponding piece of geometry.

レンダリング前のジオメトリ解析実行によるジオメトリの効率的なマルチＧＰＵレンダリングのためのシステム及び方法
図１～３のクラウドゲームネットワーク１９０（例えば、ゲームサーバ１６０内）及びＧＰＵリソース３６５の詳細な説明とともに、図１２Ａの流れ図１２００Ａは、本開示の一実施形態による、レンダリング前にジオメトリ解析を実行することによるアプリケーション用のジオメトリのマルチＧＰＵレンダリングを含む、グラフィック処理の方法を示す。即ち、図７、９、及び１０に関連して説明したようにレンダリング中に情報を生成する代わりに、情報は、プレパス（すなわち、ＺバッファまたはＭＲＴに書き込まないパス）中など、レンダリングの前に生成される。レンダリング中の情報の生成（例えば、レンダリングのＺプレパスフェーズ）に関して説明された様々な実施形態の様々な特徴及び利点の１つまたは複数は、レンダリング前の情報の生成（例えば、ジオメトリ解析を実行するプレパス）にも等しく適用可能であり、説明の重複を最小限に抑えるために、ここでは繰り返さない場合があることが理解される。前述のように、様々なアーキテクチャは、クラウドゲームシステムの１つまたは複数のクラウドゲームサーバ内、または、複数のＧＰＵを有するハイエンドグラフィックカードを含むパーソナルコンピュータやゲームコンソールなどのスタンドアロンシステム内などで、レンダリング中に領域テストを介してアプリケーション用のジオメトリのマルチＧＰＵレンダリングを実行することにより、単一の画像をレンダリングするために連携する、複数のＧＰＵを含むことができる。 Systems and Methods for Efficient Multi-GPU Rendering of Geometry by Performing Pre-Rendering Geometry Analysis FIG. Flowchart 1200A illustrates a method of graphics processing, including multi-GPU rendering of geometry for an application by performing geometry analysis prior to rendering, according to one embodiment of the present disclosure. That is, instead of generating the information during rendering as described in connection with FIGS. 7, 9, and 10, the information is generated prior to rendering, such as during a pre-pass (i.e., a pass that does not write to the Z-buffer or MRT). generated. One or more of the various features and advantages of the various embodiments described with respect to the generation of information during rendering (e.g., the Z prepass phase of rendering) include the generation of information before rendering (e.g., performing geometry analysis). It is understood that these are equally applicable to PrePass) and may not be repeated here to minimize duplication of explanation. As mentioned above, various architectures are available for rendering, such as within one or more cloud gaming servers of a cloud gaming system, or within a standalone system such as a personal computer or game console that includes a high-end graphics card with multiple GPUs. Performing multi-GPU rendering of geometry for an application through area testing during the process can involve multiple GPUs working together to render a single image.

具体的には、各ＧＰＵがその割り当てられたスクリーン領域でオブジェクトをレンダリングするように、ＧＰＵレンダリングのレスポンシビリティが各画像フレームの複数のスクリーン領域間で動的に割り当てられる。解析は、ジオメトリレンダリングの前に（例えば、プリミティブシェーダまたは計算シェーダで）実行され、画像フレーム内のジオメトリの空間分散を決定し、スクリーン領域に対するＧＰＵのレスポンシビリティを動的に調整して、その画像フレーム内のオブジェクトをレンダリングする。 Specifically, GPU rendering responsivity is dynamically allocated among multiple screen areas of each image frame such that each GPU renders objects in its assigned screen area. The analysis is performed (e.g., in a primitive or computational shader) before geometry rendering to determine the spatial distribution of the geometry within the image frame and dynamically adjust the GPU's responsiveness to the screen area to Render objects within a frame.

１２１０において、この方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。具体的には、多数のＧＰＵが連携して画像フレームを生成する。具体的には、単一の画像フレーム及び／またはリアルタイムアプリケーション用の一連の画像フレームの１つまたは複数の画像フレームのそれぞれをレンダリングするときにマルチＧＰＵ処理が実行される。レンダリングのレスポンシビリティは、以下でさらに説明するように、各画像フレームのスクリーン領域に基づいて複数のＧＰＵ間で分割される。 At 1210, the method includes rendering graphics for the application using multiple graphics processing units (GPUs). Specifically, a large number of GPUs work together to generate image frames. In particular, multi-GPU processing is performed when rendering a single image frame and/or each of one or more image frames of a series of image frames for real-time applications. Rendering responsiveness is divided among multiple GPUs based on the screen area of each image frame, as described further below.

１２２０において、方法は、複数のＧＰＵ間での解析プレパス中に画像フレームの複数のジオメトリのピースを処理するレスポンシビリティを分割することを含み、複数のジオメトリのピースのそれぞれが、対応するＧＰＵに割り当てられる。解析プレパスは、画像フレームのレンダリングのフェーズの前に実行される。 At 1220, the method includes partitioning a responsibility for processing multiple geometry pieces of an image frame during an analysis pre-pass between multiple GPUs, each of the multiple geometry pieces being assigned to a corresponding GPU. It will be done. The analysis pre-pass is performed before the image frame rendering phase.

解析プレパスでは、オブジェクトは複数のＧＰＵ間に分散される。例えば、４つのＧＰＵを有するマルチＧＰＵアーキテクチャでは、各ＧＰＵは解析プレパス中にオブジェクトの約４分の１を処理する。前述のように、一実施形態では、オブジェクトをジオメトリのより小さなピースに再分割することには利点があり得る。加えて、他の実施形態では、オブジェクトは、画像フレームごとにＧＰＵに動的に割り当てられる。解析プレパスのためにＧＰＵにジオメトリのピースを動的に割り当てると、処理効率が向上し得る。 In the analysis pre-pass, objects are distributed among multiple GPUs. For example, in a multi-GPU architecture with four GPUs, each GPU processes approximately one-fourth of the object during the analysis pre-pass. As mentioned above, in one embodiment there may be advantages to subdividing an object into smaller pieces of geometry. Additionally, in other embodiments, objects are dynamically assigned to GPUs on a per image frame basis. Dynamically allocating pieces of geometry to the GPU for analysis pre-pass may improve processing efficiency.

解析プレパスはレンダリングフェーズの前に実行されるため、処理は通常、ハードウェアでは実行されない。すなわち、解析プレパスは、様々な実施形態でシェーダを使用するなどして、ソフトウェアで実行することができる。例えば、プリミティブシェーダは、対応するピクセルシェーダがないように、解析プレパス中に使用されてもよい。加えて、Ｚバッファ及び／または他のレンダーターゲットは、解析プレパス中に書き込まれない。他の実施形態では、計算シェーダが使用される。 Since the analysis pre-pass is performed before the rendering phase, the processing is typically not performed in hardware. That is, the analysis pre-pass can be performed in software, such as using shaders in various embodiments. For example, a primitive shader may be used during the analysis pre-pass such that there is no corresponding pixel shader. Additionally, Z-buffers and/or other render targets are not written during the analysis pre-pass. In other embodiments, compute shaders are used.

１２３０において、方法は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのプレパスフェーズオーバーラップを解析において決定することを含む。前述のように、ジオメトリのピースは、オブジェクトまたはオブジェクトの部分（例えば、個々のプリミティブ、プリミティブのグループなど）であり得る。一実施形態では、生成された情報は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップの正確な表示を含む。一実施形態では、情報は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップの概算を含む。 At 1230, the method includes determining in analysis a pre-pass phase overlap of each of the plurality of geometry pieces with each of the plurality of screen regions. As mentioned above, a piece of geometry can be an object or a portion of an object (eg, an individual primitive, a group of primitives, etc.). In one embodiment, the generated information includes an accurate representation of the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions. In one embodiment, the information includes an estimate of the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions.

１２４０において、方法は、複数のスクリーン領域のそれぞれとの、複数のジオメトリのピースのそれぞれのオーバーラップに基づいて、複数のジオメトリのピース及び複数のスクリーン領域に対するそれらの関係に関する情報を生成することを含む。情報は、単にオーバーラップがあるということであってもよい。情報は、ジオメトリのピースがスクリーン領域でオーバーラップするかまたはカバーするピクセル面積または概算ピクセル面積を含むことができる。情報は、スクリーン領域に書き込まれたピクセルの数を含むことができる。情報は、スクリーン領域にオーバーラップする頂点またはプリミティブの数、またはその概算値を含むことができる。 At 1240, the method includes generating information about the plurality of geometry pieces and their relationships to the plurality of screen regions based on the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions. include. The information may simply be that there is an overlap. The information may include the pixel area or approximate pixel area that the pieces of geometry overlap or cover in the screen area. The information may include the number of pixels written to the screen area. The information may include the number of vertices or primitives that overlap the screen area, or an approximate value thereof.

１２５０において、方法は、レンダリングのジオメトリパスフェーズ中に複数のジオメトリのピースをレンダリングするために、情報に基づいて複数のスクリーン領域を複数のＧＰＵに動的に割り当てることを含む。すなわち、情報は、その後の複数のＧＰＵへのスクリーン領域の割り当てに使用することができる。例えば、各ＧＰＵは、情報に基づいて対応するスクリーン領域に割り当てられる。このようにして、各ＧＰＵは、画像フレームのレンダリングのための対応するレスポンシビリティ（例えば、対応するスクリーン領域）のディビジョンを有する。そのため、ＧＰＵへのスクリーン領域の割り当ては、画像フレームごとに異なる場合がある。 At 1250, the method includes dynamically allocating screen areas to GPUs based on the information to render the pieces of geometry during a geometry pass phase of rendering. That is, the information can be used for subsequent allocation of screen space to multiple GPUs. For example, each GPU is assigned to a corresponding screen area based on the information. In this way, each GPU has a corresponding division of responsiveness (eg, corresponding screen area) for rendering of image frames. Therefore, the allocation of screen area to the GPU may differ from image frame to image frame.

さらに、方法は、ジオメトリパスフェーズ中に、複数のＧＰＵに複数のスクリーン領域を割り当てることから決定されたＧＰＵからスクリーン領域への割り当てに基づいて、複数のＧＰＵのそれぞれで複数のジオメトリのピースをレンダリングすることを含む。 Further, during the geometry pass phase, the method renders the pieces of geometry on each of the plurality of GPUs based on the GPU-to-screen area allocation determined from allocating the plurality of screen areas to the plurality of GPUs. including doing.

図１２Ｂは、本開示の一実施形態による、画像フレームをレンダリングする前に（例えば、レンダリングのジオメトリパスフェーズ中に）実行される解析プレパスを示す、レンダリングタイミング図１２００Ｂである。解析プレパスは、ジオメトリのピースとスクリーン領域の間の関係の解析専用である。解析プレパスは、画像フレームのジオメトリレンダリングのために、スクリーン領域をＧＰＵに動的に割り当てるために使用される情報を生成する。具体的には、レンダリングタイミング図１２００Ｂは、複数のＧＰＵを使用して画像フレームを連携してレンダリングすることを示している。レンダリングのレスポンシビリティは、スクリーン領域に基づいて複数のＧＰＵ間で分割される。前述のように、画像フレームのジオメトリをレンダリングする前に、ＧＰＵはジオメトリとそのスクリーン領域との関係に関する情報を生成する。この情報は、ＧＰＵをスクリーン領域に割り当てるために使用され、より効率的なレンダリングを可能にする。例えば、レンダリングの前に、第１のＧＰＵがジオメトリのピースとそのスクリーン領域との関係に関する情報を生成し、この情報は、そのジオメトリのピースをレンダリングする１つまたは複数の「レンダリングＧＰＵ」にスクリーン領域を割り当てる際に使用される。 FIG. 12B is a rendering timing diagram 1200B illustrating an analysis pre-pass performed before rendering an image frame (eg, during the geometry pass phase of rendering), according to one embodiment of the present disclosure. The analysis pre-pass is dedicated to the analysis of relationships between pieces of geometry and screen areas. The analysis prepass generates information that is used to dynamically allocate screen space to the GPU for geometric rendering of image frames. Specifically, rendering timing diagram 1200B illustrates using multiple GPUs to coordinately render image frames. Rendering responsiveness is divided among multiple GPUs based on screen area. As mentioned above, before rendering the geometry of an image frame, the GPU generates information about the geometry and its relationship to the screen area. This information is used to allocate GPUs to screen areas, allowing for more efficient rendering. For example, prior to rendering, a first GPU generates information about the relationship between a piece of geometry and its screen area, and this information is passed to one or more "rendering GPUs" that render that piece of geometry. Used when allocating space.

具体的には、レンダリングタイミング図１２００Ｂは、タイムライン１２９０を参照して、４つのＧＰＵ（例えば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）による１つまたは複数のオブジェクトのレンダリングを示す。前述のように、４つのＧＰＵの使用は、マルチＧＰＵアーキテクチャに１つまたは複数のＧＰＵを含めることができるように、単に説明を目的としたものである。垂直線１２０１は、画像フレームの一連のレンダリングフェーズの開始を示す。垂直線１２０１は、解析プレパス１２１０の開始も示す。解析プレパスでは、オブジェクトは複数のＧＰＵの間で分散される。４つのＧＰＵを用いて、各ＧＰＵがオブジェクトの約４分の１を処理する。シンクポイント１２３０Ａは、各ＧＰＵがそれぞれのジオメトリパスレンダリングフェーズ１２２０を同時に開始するように提供される。すなわち、一実施形態では、シンク操作１２３０ａは、すべてのＧＰＵによるジオメトリパスの同時開始を保証する。別の実施形態では、前に説明したように、シンク操作１２３０ａは使用されず、レンダリングのジオメトリパスフェーズが、解析プレパスを終了する任意のＧＰＵに対して、他のすべてのＧＰＵが対応する解析プレパスを終了するのを待たずに開始され得る。 Specifically, rendering timing diagram 1200B illustrates the rendering of one or more objects by four GPUs (e.g., GPU-A, GPU-B, GPU-C, and GPU-D) with reference to timeline 1290. Show rendering. As mentioned above, the use of four GPUs is for illustrative purposes only, as one or more GPUs can be included in a multi-GPU architecture. Vertical line 1201 indicates the beginning of a series of rendering phases for an image frame. Vertical line 1201 also indicates the start of analysis pre-pass 1210. In the analysis pre-pass, objects are distributed among multiple GPUs. With four GPUs, each GPU processes approximately one-fourth of the object. A sync point 1230A is provided such that each GPU starts its respective geometry pass rendering phase 1220 at the same time. That is, in one embodiment, sink operation 1230a ensures simultaneous initiation of geometry passes by all GPUs. In another embodiment, as previously described, the sink operation 1230a is not used and the geometry pass phase of rendering is performed for any GPU that finishes the analysis pre-pass, while all other GPUs complete the corresponding analysis pre-pass. can be started without waiting for it to finish.

シンクポイント１２３０ｂは、現在の画像フレームのレンダリングのジオメトリパスフェーズの終了を示し、また、各ＧＰＵが現在のフレームのレンダリングの後続フェーズを同時に続行できるように、または次の画像フレームのレンダリングを同時に開始できるように提供される。 Sync point 1230b marks the end of the geometry pass phase of rendering the current image frame and also allows each GPU to simultaneously continue subsequent phases of rendering the current frame or begin rendering the next image frame simultaneously. Provided as possible.

一実施形態では、対応する画像フレームをレンダリングするために、単一のコマンドバッファが複数のＧＰＵによって使用される。レンダリングコマンドバッファには、解析プレパスを実行するために、状態を設定するコマンドと、プリミティブシェーダまたはコンピュータシェーダを実行するコマンドとを含めることができる。ＧＰＵによる様々な操作の開始を同期するために、シンク操作をコマンドバッファ内に含めることができる。例えば、シンク操作を使用して、ＧＰＵによるレンダリングのジオメトリパスフェーズの開始を同期することができる。そのため、コマンドバッファには、レンダリングのジオメトリパスフェーズを実行するための各オブジェクトのドローコールと状態設定が含まれ得る。 In one embodiment, a single command buffer is used by multiple GPUs to render corresponding image frames. The rendering command buffer may include commands to set state and run primitive or computer shaders to perform an analysis pre-pass. Sink operations can be included in the command buffer to synchronize the initiation of various operations by the GPU. For example, a sink operation can be used to synchronize the start of the geometry pass phase of rendering by the GPU. As such, the command buffer may contain draw calls and state settings for each object to perform the geometry pass phase of rendering.

一実施形態では、情報の生成は、専用の１つまたは複数の命令を使用することによって加速される。つまり、情報を生成するシェーダは、１つまたは複数の専用命令を使用して、ジオメトリのピースとそのスクリーン領域との関係に関する情報の生成を加速する。 In one embodiment, generation of information is accelerated by using one or more dedicated instructions. That is, a shader that generates information uses one or more specialized instructions to accelerate the generation of information about the relationship between a piece of geometry and its screen area.

一実施形態では、命令は、ジオメトリのピースのプリミティブとスクリーン領域のそれぞれとの間の正確なオーバーラップを計算することができる。例えば、図１３Ａは、本開示の一実施形態による、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成するために、解析プレパスフェーズを実行するときの、プリミティブ１３５０と１つまたは複数のスクリーン領域の間の正確なオーバーラップの計算を示す図１３１０である。例えば、プリミティブ１３５０は、３つの異なる領域をオーバーラップするように示され、プリミティブ１３５０のそれぞれの部分のオーバーラップは、領域のそれぞれについて正確に決定される。 In one embodiment, the instructions may calculate the exact overlap between each of the primitives of the piece of geometry and the screen area. For example, FIG. 13A illustrates the process of performing an analysis pre-pass phase to generate information used for dynamic allocation of screen space to GPUs for geometry rendering of image frames, according to one embodiment of the present disclosure. , a diagram 1310 illustrating calculation of exact overlap between a primitive 1350 and one or more screen regions. For example, primitive 1350 is shown to overlap three different regions, and the overlap of each portion of primitive 1350 is precisely determined for each of the regions.

他の実施形態では、命令実施態様の複雑さを低減するために、この命令はオーバーラップ面積の概算を実行することができ、情報は、プリミティブが１つまたは複数のスクリーン領域とオーバーラップする概算面積を含む。具体的には、命令は、ジオメトリのピースのプリミティブと１つまたは複数のスクリーン領域との間の概算のオーバーラップを計算することができる。例えば、図１３Ｂは、本開示の一実施形態による、画像フレームのジオメトリレンダリングのためのＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成するために、解析プレパスフェーズを実行するときの、ジオメトリのピースと複数のスクリーン領域の間の概算のオーバーラップの計算を示す一対の図である。 In other embodiments, to reduce the complexity of the instruction implementation, the instruction may perform an overlap area estimation, and the information includes an estimate of the extent to which the primitive overlaps one or more screen areas. Including area. Specifically, the instructions may calculate an approximate overlap between a primitive of a piece of geometry and one or more screen regions. For example, FIG. 13B illustrates the process of performing an analysis pre-pass phase to generate information used for dynamic allocation of screen space to GPUs for geometry rendering of image frames, according to one embodiment of the present disclosure. , a pair of diagrams illustrating the calculation of approximate overlap between a piece of geometry and multiple screen regions;

図１３Ｂの左側の図に示すように、命令はプリミティブの境界ボックスを使用することができる。こうして、プリミティブ１３５０の境界ボックスと１つまたは複数のスクリーン領域とのオーバーラップが決定される。境界１３２０Ａは、境界ボックスの解析を通じて決定されたジオメトリ１３５０のピースの概算のオーバーラップを示す。 As shown in the left diagram of FIG. 13B, instructions can use bounding boxes of primitives. Thus, the overlap between the bounding box of primitive 1350 and one or more screen regions is determined. Boundary 1320A indicates the approximate overlap of pieces of geometry 1350 determined through bounding box analysis.

図１３Ｂの右側の図において、命令は、プリミティブに対してスクリーン領域をチェックし、ジオメトリのピースがオーバーラップしないスクリーン領域が除外され、各スクリーン領域とオーバーラップするプリミティブの部分に対して境界ボックスが生成される。境界１３２０Ｂは、境界ボックスの解析及びオーバーラップフィルタリングによって決定されるプリミティブ１３５０の概算のオーバーラップを示す。図１３Ｂの右側の図の境界ボックス１３２０Ｂは、図１３Ｂの左側の図の境界ボックス１３２０Ａよりも小さいことに留意されたい。 In the right-hand diagram of Figure 13B, the instructions check the screen area for the primitives, exclude screen areas where no piece of geometry overlaps, and create a bounding box for the portion of the primitive that overlaps each screen area. generated. Boundary 1320B indicates the approximate overlap of primitives 1350 as determined by bounding box analysis and overlap filtering. Note that bounding box 1320B in the right diagram of FIG. 13B is smaller than bounding box 1320A in the left diagram of FIG. 13B.

さらに他の実施形態では、命令の複雑さをさらに低減するために、命令は、ジオメトリのピースがスクリーン領域に存在するかどうかなどの存在情報を生成することができる。例えば、存在情報は、ジオメトリのピースのプリミティブがスクリーン領域とオーバーラップするかどうかを示すことができる。情報は、対応するスクリーン領域内のジオメトリのピースの概算の存在を含むことができる。 In yet other embodiments, to further reduce the complexity of the instructions, the instructions may generate presence information, such as whether a piece of geometry is present in the screen area. For example, presence information can indicate whether a primitive of a piece of geometry overlaps a screen area. The information may include the approximate existence of the piece of geometry within the corresponding screen area.

別の実施形態では、シェーダは、位置キャッシュまたはパラメータキャッシュにスペースを割り当てない。つまり、シェーダは位置またはパラメータキャッシュの割り振りを実行せず、それにより解析プレパスを実行するときに高度な並列処理が可能になる。これはまた、解析プレパスに必要な時間の対応する削減にもつながる。 In another embodiment, the shader does not allocate space for position or parameter caches. That is, the shader does not perform position or parameter cache allocation, which allows for a high degree of parallelism when performing the analysis pre-pass. This also leads to a corresponding reduction in the time required for analysis pre-pass.

別の実施形態では、解析プレパスで実行される解析、またはジオメトリパスでのレンダリングのいずれかを実行するために、単一のシェーダが使用される。例えば、情報を生成するシェーダは、ジオメトリのピースとそのスクリーン領域との関係に関する情報を出力するように、または後のレンダリングステージで使用することによって頂点位置とパラメータ情報を出力するように構成可能であってもよい。これは、シェーダがチェックできる外部ハードウェア状態（例えば、ハードウェアレジスタの設定）を介して、またはシェーダへの入力を介してなど、様々な方法で実現できる。その結果、シェーダは２つの異なる機能を実行して、対応する画像フレームをレンダリングする。 In another embodiment, a single shader is used to perform either the analysis performed in the analysis pre-pass or the rendering in the geometry pass. For example, a shader that generates information can be configured to output information about a piece of geometry and its relationship to the screen area, or to output vertex position and parameter information for use in a later rendering stage. There may be. This can be accomplished in a variety of ways, such as through external hardware state that the shader can check (eg, setting hardware registers), or through inputs to the shader. As a result, the shader performs two different functions to render the corresponding image frame.

前述のように、レンダリングのジオメトリパスフェーズを開始する前に、この情報を使用して領域をＧＰＵに割り当てる。前のフレームのレンダリング中に生成された情報（例えば、ジオメトリのピースをレンダリングする間にシェーディングされた実際のピクセル数）は、スクリーン領域をＧＰＵに割り当てるために使用することもできる。前のフレームからの情報には、例えば、スクリーン領域ごとのジオメトリのピースごとにシェーディングされる実際のピクセルの数が含まれ得る。つまり、スクリーン領域は、前の画像フレームから生成された情報（例えば、ＧＰＵの使用状況）と解析プレパス中に生成された情報に基づいてＧＰＵに割り当てられる。 As described above, this information is used to allocate regions to the GPU before starting the geometry pass phase of rendering. Information generated during rendering of previous frames (e.g., the actual number of pixels shaded while rendering a piece of geometry) can also be used to allocate screen area to the GPU. Information from previous frames may include, for example, the actual number of pixels shaded per piece of geometry per screen area. That is, screen area is allocated to GPUs based on information generated from previous image frames (eg, GPU usage) and information generated during the analysis pre-pass.

ジオメトリの再分割によるジオメトリの効率的なマルチＧＰＵレンダリングのシステム及び方法
図１～３のクラウドゲームネットワーク１９０（例えば、ゲームサーバ１６０内）及びＧＰＵリソース３６５の詳細な説明と共に、図１４Ｂのライン１１１０は、ジオメトリを再分割することによるアプリケーションのマルチＧＰＵレンダリングを含むグラフィック処理のための方法を示す。オブジェクト０、１、及び２は、個々のドローコールによって使用または生成されたジオメトリを表す。オブジェクト全体（つまり、ドローコール）をＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄに分散するのではなく、代わりに、ＧＰＵは各オブジェクトを、位置及び／またはパラメータキャッシュが割り当てられるおおよそのサイズのピースなど、ジオメトリのより小さなピースに分割する。純粋に説明のために、オブジェクト０は、図６Ｂのオブジェクト６１０のように、ピース「ａ」、「ｂ」、「ｃ」、「ｄ」、「ｅ」及び「ｆ」に分割される。また、オブジェクト１は、ピース「ｇ」、「ｈ」、及び「ｉ」に分割される。さらに、オブジェクト２はピース「ｊ」、「ｋ」、「ｌ」、「ｍ」、「ｎ」、及び「ｏ」に分割される。分散１１１０（例えば、ＡＢＣＤＡＢＣＤＡＢＣＤ…行）は、複数のＧＰＵ間でのレンダリング（またはレンダリングのフェーズ）のレスポンシビリティの均等な分散を示している。この分散はオブジェクト全体（つまり、ドローコール）よりも粒度が細かいため、ＧＰＵ間のレンダリング時間の不均衡が減少し、レンダリングの合計時間（またはレンダリングのフェーズの時間）が減少する。図１４Ａの流れ図１４００Ａと図１４Ｂのライン１４１０は、レンダリングフェーズ中にＧＰＵのレスポンシビリティの割り当てを再分散するために、レンダリングフェーズ中にタイミング解析を実行することによる、アプリケーションのためのジオメトリのマルチＧＰＵレンダリングを含むグラフィック処理のための方法を示す。図７～１３のレンダリング及びレンダリングのジオメトリパスフェーズの前及びその最中の情報の生成に関して説明された様々な実施形態の様々な特徴及び利点の１つまたは複数が、ジオメトリを再分割する、及び／またはタイミング解析を実行するときの使用に等しく適用でき、説明の重複を最小限に抑えるために、ここでは繰り返さない場合がある、ということが理解される。前述のように、様々なアーキテクチャは、クラウドゲームシステムの１つまたは複数のクラウドゲームサーバ内、または、複数のＧＰＵを有するハイエンドグラフィックカードを含むパーソナルコンピュータやゲームコンソールなどのスタンドアロンシステム内などで、レンダリング中に領域テストを介してアプリケーションのジオメトリのマルチＧＰＵレンダリングを実行することにより、単一の画像をレンダリングするために連携する、複数のＧＰＵを含むことができる。 Systems and Methods for Efficient Multi-GPU Rendering of Geometry by Subdividing Geometry With a detailed description of cloud gaming network 190 (e.g., within game server 160) and GPU resources 365 of FIGS. 1-3, line 1110 of FIG. , presents a method for graphics processing including multi-GPU rendering of applications by subdividing the geometry. Objects 0, 1, and 2 represent the geometry used or generated by the individual draw calls. Rather than distributing the entire object (i.e., draw calls) across GPU-A, GPU-B, GPU-C, and GPU-D, the GPUs instead handle each object with a location and/or parameter cache assigned to it. Split geometry into smaller pieces, such as roughly sized pieces. Purely for illustration purposes, object 0 is divided into pieces "a,""b,""c,""d,""e," and "f," like object 610 in FIG. 6B. Further, object 1 is divided into pieces "g", "h", and "i". Furthermore, object 2 is divided into pieces "j", "k", "l", "m", "n", and "o". Distribution 1110 (eg, line ABCDABCDABCD...) indicates an even distribution of rendering (or phases of rendering) responsiveness among multiple GPUs. Because this distribution is more fine-grained than the entire object (i.e., draw calls), the rendering time imbalance between GPUs is reduced and the total rendering time (or rendering phase time) is reduced. Flow diagram 1400A in FIG. 14A and line 1410 in FIG. A method for graphics processing, including rendering, is presented. One or more of the various features and advantages of the various embodiments described with respect to the rendering of FIGS. 7-13 and the generation of information before and during the geometry pass phase of rendering include subdividing the geometry; It is understood that they are equally applicable for use in performing timing analysis and/or may not be repeated here to minimize duplication of description. As mentioned above, various architectures are available for rendering, such as within one or more cloud gaming servers of a cloud gaming system, or within a standalone system such as a personal computer or game console that includes a high-end graphics card with multiple GPUs. Performing multi-GPU rendering of an application's geometry through area testing during the process can involve multiple GPUs working together to render a single image.

いくつかの実施形態では、図７～１３に関して前に説明したように、各ＧＰＵがその割り当てられたスクリーン領域でオブジェクトをレンダリングするように、ＧＰＵレンダリングのレスポンシビリティが各画像フレームの複数のスクリーン領域間で固定的または動的に割り当てられる。他の実施形態では、各ＧＰＵは、それ自体のＺバッファまたは他のレンダーターゲットにレンダリングする。レンダリングのフェーズの１つまたは複数（例えば、ジオメトリプレパス解析、Ｚプレパス、またはジオメトリレンダリング）でタイミング解析が実行され、その目的は、これらのフェーズでＧＰＵのレスポンシビリティの割り当てを再分散するためである。つまり、レンダリングフェーズ中にＧＰＵのレスポンシビリティの割り当てを再分散するために、レンダリングフェーズ中にタイミング解析が実行され、それは例えば、一実施態様では、画像フレームのジオメトリレンダリングのためにジオメトリのピースに対してＺプレパスフェーズを実行して、ＧＰＵへのスクリーン領域の動的割り当てに使用される情報を生成するときなどである。例えば、最初に１つのＧＰＵに割り当てられたスクリーン領域が、レンダリングのフェーズ中に別のＧＰＵに再割り当てされる場合がある（例えば、あるＧＰＵがそのフェーズ中に他のＧＰＵに遅れている可能性がある）。 In some embodiments, the responsiveness of GPU rendering is based on multiple screen regions for each image frame, such that each GPU renders objects in its assigned screen region, as previously described with respect to FIGS. 7-13. fixedly or dynamically allocated between In other embodiments, each GPU renders to its own Z-buffer or other render target. Timing analysis is performed in one or more of the phases of rendering (e.g., geometry prepass analysis, Z prepass, or geometry rendering), the purpose of which is to redistribute GPU responsiveness allocation in these phases. . That is, timing analysis is performed during the rendering phase in order to redistribute the allocation of GPU responsiveness during the rendering phase, such as for geometry pieces for geometry rendering of an image frame, in one implementation. such as when executing a Z pre-pass phase to generate information used for dynamic allocation of screen space to GPUs. For example, screen space initially allocated to one GPU may be reallocated to another GPU during the rendering phase (e.g., one GPU may be lagging behind other GPUs during that phase). ).

１４１０において、方法は、複数のグラフィック処理ユニット（ＧＰＵ）を使用してアプリケーション用のグラフィックをレンダリングすることを含む。具体的には、単一の画像フレーム及び／またはリアルタイムアプリケーション用の一連の画像フレームの１つまたは複数の画像フレームのそれぞれをレンダリングするときにマルチＧＰＵ処理が実行される。すなわち、複数のＧＰＵは連携して、複数のジオメトリのピースを含む対応する画像フレームをレンダリングする。 At 1410, the method includes rendering graphics for the application using multiple graphics processing units (GPUs). In particular, multi-GPU processing is performed when rendering a single image frame and/or each of one or more image frames of a series of image frames for real-time applications. That is, multiple GPUs work together to render corresponding image frames that include multiple pieces of geometry.

１４２０において、方法は、複数のスクリーン領域に基づいて、複数のＧＰＵ間でグラフィックのジオメトリのレンダリングに対するレスポンシビリティを分割することを含む。つまり、各ＧＰＵは、対応するレスポンシビリティのディビジョン（対応するスクリーン領域のセット）を有する。 At 1420, the method includes partitioning responsiveness for rendering geometry of the graphic among multiple GPUs based on multiple screen regions. That is, each GPU has a corresponding division of responsiveness (a corresponding set of screen areas).

ジオメトリのレンダリングまたはジオメトリの解析の実行中、レンダリングまたは解析にかかる時間は、オブジェクトに関するレスポンシビリティのディビジョンを調整するために使用される。特に、１４３０において、方法は、画像フレームのレンダリングまたは解析のフェーズ中に、第１のＧＰＵが、第２のＧＰＵなど、少なくとも１つの他のＧＰＵに遅れていると判断することを含む。１４４０において、方法は、第１のＧＰＵが第２のＧＰＵより少なく割り当てられるようにジオメトリを動的に割り当てることを含む。 During geometry rendering or geometry analysis, the rendering or analysis time is used to adjust the responsiveness divisions for the object. In particular, at 1430, the method includes determining that the first GPU lags at least one other GPU, such as a second GPU, during a phase of image frame rendering or analysis. At 1440, the method includes dynamically allocating geometry such that the first GPU is allocated less than the second GPU.

例えば、ジオメトリの動的な割り当ては、説明の目的で、Ｚバッファの生成中に実行することができる。ジオメトリの動的割り当ては、解析プレパス及び／またはレンダリングのジオメトリパスフェーズ中に実行され得る。Ｚバッファの生成及びＺプレパス解析中にジオメトリを動的に割り当てる場合、１つまたは複数のＺバッファが複数のＧＰＵによって生成される、及び／またはレンダリングのＺプレパスフェーズ中に画像フレームに対して連携してマージされる。具体的には、ジオメトリのピースは、レンダリングのＺプレパスフェーズを処理するためにＧＰＵ間で分割され、複数のジオメトリのピースのそれぞれは、対応するＧＰＵに割り当てられる。対応する画像フレームのレンダリングを最適化するのに使用される情報を生成するために、Ｚプレパスフェーズ中にハードウェアを使用する代わりに、ハードウェアは、解析プレパスを実行して、例えば、後続のジオメトリパスのレンダリング速度を最適化するために使用される情報を生成するように構成することができる。 For example, dynamic allocation of geometry can be performed during Z-buffer generation, for illustration purposes. Dynamic assignment of geometry may be performed during the analysis pre-pass and/or the geometry pass phase of rendering. Z-Buffer Generation and Z-Pre-Pass When dynamically allocating geometry during analysis, one or more Z-buffers are generated by multiple GPUs and/or collaborated on an image frame during the Z-prepass phase of rendering. and then merged. Specifically, pieces of geometry are divided among GPUs to process the Z pre-pass phase of rendering, and each of the plurality of pieces of geometry is assigned to a corresponding GPU. Instead of using the hardware during the Z pre-pass phase to generate information used to optimize the rendering of the corresponding image frame, the hardware performs an analysis pre-pass to e.g. Can be configured to generate information used to optimize rendering speed of geometry passes.

具体的には、オブジェクトは、図６Ｂで前に説明したように、より小さなピースに再分割することができる。レンダリングのＺプレパスフェーズにおけるジオメトリのピースのレンダリングのレスポンシビリティは、図１４Ｂの分散１１１０に関して前述したように、インターリーブ方式でＧＰＵ間に分散され、図１４Ｂは、レンダリングのＺプレパスフェーズを実行して、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てるために使用される情報を生成するためのＧＰＵ割り当ての様々な分散を示す。分散１１１０は、Ｚプレパスに対する複数のＧＰＵ間のレンダリングレスポンシビリティの分散を示している。前述のように、各ＧＰＵは画像フレームのジオメトリの対応する部分に割り当てられ、その部分はさらにジオメトリのピースに分割され得る。分散１１１０に示すように、ジオメトリの連続するピースが異なるＧＰＵに割り当てられるため、結果として、Ｚプレパス中のレンダリング時間はほぼバランスがとれる。 Specifically, the object can be subdivided into smaller pieces as previously described in FIG. 6B. The rendering responsibilities of pieces of geometry during the Z pre-pass phase of rendering are distributed among the GPUs in an interleaved manner, as described above with respect to distribution 1110 of FIG. 14B, and FIG. Figure 3 illustrates various distributions of GPU allocation to generate information used to dynamically allocate screen area to GPUs for geometric rendering of image frames. Variance 1110 shows the distribution of rendering responsiveness among multiple GPUs for the Z pre-pass. As mentioned above, each GPU is assigned to a corresponding portion of the image frame's geometry, which may be further divided into pieces of geometry. As shown in the distribution 1110, successive pieces of geometry are assigned to different GPUs, resulting in roughly balanced rendering times during the Z pre-pass.

分散１４１０に示すように、ジオメトリのピースをレンダリングするレスポンシビリティを動的に調整することで、ＧＰＵ間のレンダリング時間のさらなるバランスを実現できる。これは、レンダリングのＺプレパスフェーズを実行するときのＧＰＵへのジオメトリのピースの分散であり、レンダリングのそのフェーズ中に動的に調整される。例えば、分散１４１０［ＡＢＣＤＡＢＣＤＢＣＤＢＢＣＤ行］は、複数のＧＰＵ間でＺプレパスフェーズを実行するレスポンシビリティの非対称分散を示している。例えば、特定のＧＰＵに、他のＧＰＵに割り当てられたものよりも大きいジオメトリのピースが割り当てられていることにより他のＧＰＵに比べてＺプレパスが遅れている場合、非対称分散が有利になり得る。 Further balancing of rendering times between GPUs can be achieved by dynamically adjusting the responsivity of rendering pieces of geometry, as shown in distribution 1410. This is the distribution of pieces of geometry to the GPU when performing the Z pre-pass phase of rendering, and is dynamically adjusted during that phase of rendering. For example, distribution 1410 [Row ABCDABCDBCDBBCD] shows an asymmetric distribution of responsibilities that executes the Z pre-pass phase across multiple GPUs. For example, asymmetric distribution may be advantageous if a particular GPU is delayed in Z prepass relative to other GPUs due to being assigned a larger piece of geometry than those assigned to other GPUs.

分散１４１０に示すように、ＧＰＵ－Ａは、Ｚプレパスフェーズ中にジオメトリのピースをレンダリングするためにより多くの時間を費やしているため、ジオメトリのピースをＧＰＵに割り当てるときにスキップされる。例えば、Ｚプレパスレンダリング中にオブジェクト１のジオメトリ「ｉ」のピースをＧＰＵ－Ａに処理させる代わりに、ＧＰＵ－ＢがＺプレパスフェーズ中にジオメトリのピースをレンダリングするように割り当てられる。そのため、ＧＰＵ－Ｂには、レンダリングのＺプレパスフェーズ中にＧＰＵ－Ａよりも多くのジオメトリのピースが割り当てられる。具体的には、レンダリングのＺプレパスフェーズ中に、ジオメトリのピースが第１のＧＰＵから割り当て解除され、第２のＧＰＵに割り当てられる。さらに、ＧＰＵ－Ｂは他のＧＰＵよりも進んでいるため、Ｚプレパスフェーズ中により多くのジオメトリを処理できる。すなわち、分散１４１０は、Ｚプレパスレンダリングのための連続するジオメトリのピースへのＧＰＵ－Ｂの繰り返し割り当てを示す。例えば、ＧＰＵ－Ｂは、Ｚプレパスフェーズ中にオブジェクト２のジオメトリのピース「ｌ」と「ｍ」を処理するために割り当てられる。 As shown in variance 1410, GPU-A is spending more time rendering pieces of geometry during the Z pre-pass phase, so it is skipped when assigning pieces of geometry to GPUs. For example, instead of having GPU-A process a piece of geometry "i" of object 1 during Z pre-pass rendering, GPU-B is assigned to render a piece of geometry during the Z pre-pass phase. Therefore, GPU-B is assigned more pieces of geometry than GPU-A during the Z pre-pass phase of rendering. Specifically, during the Z pre-pass phase of rendering, pieces of geometry are deallocated from a first GPU and assigned to a second GPU. Additionally, GPU-B is more advanced than the other GPUs, so it can process more geometry during the Z pre-pass phase. That is, distribution 1410 indicates the iterative assignment of GPU-B to successive pieces of geometry for Z pre-pass rendering. For example, GPU-B is assigned to process geometry pieces "l" and "m" of object 2 during the Z pre-pass phase.

上記はジオメトリの「動的割り当て」の観点から提示されているが、これを「割り当て」と「再割り当て」の観点から見ることも等しく有効である。例えば、分散１４１０に示すように、ＧＰＵ－Ａは、Ｚプレパスフェーズ中にジオメトリのピースをレンダリングするのにより多くの時間を費やしているため、再割り当てされる。例えば、Ｚプレパスレンダリング中にオブジェクト１のジオメトリ「ｉ」のピースをＧＰＵ－Ａに処理させる代わりに、ＧＰＵ－ＢがＺプレパスフェーズ中にジオメトリのピースをレンダリングするように割り当てられ、ＧＰＵ－Ａは、ジオメトリのピースをレンダリングするために最初に割り当てられていてもよい。さらに、ＧＰＵ－Ｂは他のＧＰＵよりも進んでいるため、Ｚプレパスフェーズ中により多くのジオメトリを処理できる。すなわち、分散１４１０は、Ｚプレパスレンダリングのための連続するジオメトリのピースへのＧＰＵ－Ｂの繰り返し割り当てかまたは再割り当てを示す。例えば、ＧＰＵ－Ｂは、Ｚプレパスフェーズ中にオブジェクト２のジオメトリのピース「ｌ」と「ｍ」を処理するために割り当てられる。つまり、オブジェクト２のジオメトリのピース「ｌ」をレンダリングするために、そのジオメトリのピースが最初にＧＰＵ－Ａに割り当てられていた可能性があっても、ＧＰＵ－Ｂが割り当てられる。そのため、第１のＧＰＵに初めに割り当てられたジオメトリのピースは、レンダリングのＺプレパスフェーズ中に第２のＧＰＵ（レンダリングが進んでいる可能性がある）に再割り当てされる。 Although the above is presented in terms of "dynamic allocation" of geometry, it is equally valid to view it in terms of "allocation" and "reallocation". For example, as shown in distribution 1410, GPU-A is reassigned because it is spending more time rendering pieces of geometry during the Z pre-pass phase. For example, instead of having GPU-A process a piece of geometry 'i' of object 1 during Z pre-pass rendering, GPU-B is assigned to render a piece of geometry 'i' during the Z pre-pass phase, and GPU-A , may be initially assigned to render a piece of geometry. Additionally, GPU-B is more advanced than the other GPUs, so it can process more geometry during the Z pre-pass phase. That is, distribution 1410 indicates the repeated assignment or reallocation of GPU-B to successive pieces of geometry for Z pre-pass rendering. For example, GPU-B is assigned to process geometry pieces "l" and "m" of object 2 during the Z pre-pass phase. That is, to render a piece of geometry "l" of object 2, GPU-B is assigned, even though that piece of geometry may have been initially assigned to GPU-A. As such, pieces of geometry initially assigned to a first GPU are reassigned to a second GPU (where rendering may be in progress) during the Z pre-pass phase of rendering.

ＧＰＵへのＺプレパスフェーズ中のジオメトリのピースの割り当てはバランスが取れていない可能性があるが、ＧＰＵによって実行されるＺプレパスフェーズ中の処理は、ほぼバランスが取れていることが判明する場合がある（例えば、各ＧＰＵはレンダリングのＺプレパスフェーズを実行するためにほぼ同じ時間を費やす）。 Although the allocation of geometry pieces during the Z pre-pass phase to the GPU may be unbalanced, the processing performed by the GPU during the Z pre-pass phase may turn out to be approximately balanced. (e.g. each GPU spends approximately the same amount of time performing the Z pre-pass phase of rendering).

別の実施形態では、ジオメトリの動的割り当ては、画像フレームのレンダリングのジオメトリパスフェーズ中に実行することができる。例えば、スクリーン領域は、Ｚプレパスまたは解析プレパス中に生成された情報に基づいて、レンダリングのジオメトリパスフェーズ中にＧＰＵに割り当てられる。あるＧＰＵに割り当てられたスクリーン領域は、レンダリングフェーズ中に別のＧＰＵに再割り当てされる場合がある。これにより、効率が向上する可能性があり、これは他のＧＰＵよりも進んでいるＧＰＵには追加のスクリーン領域が割り振られる可能性があり、他のＧＰＵよりも遅れているＧＰＵは追加のスクリーン領域が割り振られるのを回避できるからである。特に、連携する複数のＧＰＵは、レンダリングのＺプレパスフェーズ中に画像フレームのＺバッファを生成する。情報は、このＺプレパス中に、画像フレームのジオメトリのピースとそれらの複数のスクリーン領域との関係について生成される。スクリーン領域は、レンダリングのジオメトリパスフェーズ中に画像フレームをレンダリングするために、情報に基づいてＧＰＵに割り当てられる。ＧＰＵは、ＧＰＵからスクリーン領域への割り当てに基づくレンダリングのジオメトリパスフェーズ中に、ジオメトリのピースをレンダリングする。タイミング解析は、レンダリングのジオメトリパスフェーズ中に実行され、その結果、初めに第１のＧＰＵに割り当てられたジオメトリの第１のピースが、ジオメトリパスフェーズ中にレンダリングするために第２のＧＰＵに再割り当てされ得る。例えば、一実施形態では、レンダリングのジオメトリパスフェーズの処理において、第１のＧＰＵが遅れている可能性がある。別の実施形態では、レンダリングのジオメトリパスフェーズの処理において、第２のＧＰＵが進んでいる可能性がある。 In another embodiment, dynamic allocation of geometry may be performed during the geometry pass phase of image frame rendering. For example, screen area is assigned to the GPU during the geometry pass phase of rendering based on information generated during the Z pre-pass or analysis pre-pass. Screen area assigned to one GPU may be reallocated to another GPU during the rendering phase. This may improve efficiency, as GPUs that are more advanced than others may be allocated additional screen space, and GPUs that are more advanced than others may be allocated additional screen space. This is because the area can be avoided from being allocated. In particular, multiple GPUs working together generate a Z-buffer of image frames during the Z-prepass phase of rendering. Information is generated during this Z pre-pass about the geometry pieces of the image frame and their relationship to multiple screen regions. Screen area is informedly allocated to the GPU for rendering image frames during the geometry pass phase of rendering. The GPU renders pieces of geometry during the geometry pass phase of rendering based on GPU-to-screen area assignments. The timing analysis is performed during the geometry pass phase of rendering, such that the first piece of geometry initially assigned to the first GPU is reloaded to the second GPU for rendering during the geometry pass phase. may be assigned. For example, in one embodiment, the first GPU may be behind in processing the geometry pass phase of rendering. In another embodiment, a second GPU may be ahead in processing the geometry pass phase of rendering.

図１５Ａ～１５Ｂは、様々なスクリーン領域割り振り戦略を示しており、これは、図７～１４に関して前に説明した画像フレームのレンダリングに適用できる。 15A-15B illustrate various screen area allocation strategies that can be applied to rendering image frames as previously described with respect to FIGS. 7-14.

具体的には、図１５Ａは、本開示の一実施形態による、特定のスクリーン領域においてジオメトリ（例えば、オブジェクト０～３に関連するジオメトリ）のピースをレンダリングするために複数のＧＰＵの使用することを示す図である。すなわち、スクリーン領域１５１０は、レンダリングするために複数のＧＰＵに割り当てられ得る。例えば、これにより、レンダリングフェーズの後半で非常に密集したジオメトリがある場合などに、効率が向上する可能性がある。スクリーン領域１５１０を複数のＧＰＵに割り当てるには、通常、スクリーン領域を再分割する必要があるため、各ＧＰＵがスクリーン領域の一部分または部分のレスポンシビリティを有することができる。 Specifically, FIG. 15A illustrates the use of multiple GPUs to render a piece of geometry (e.g., the geometry associated with objects 0-3) in a particular screen region, according to one embodiment of the present disclosure. FIG. That is, screen area 1510 may be assigned to multiple GPUs for rendering. This can improve efficiency, for example when there is very dense geometry later in the rendering phase. Allocating screen area 1510 to multiple GPUs typically requires subdividing the screen area so that each GPU can have the responsiveness of a portion or portions of the screen area.

図１５Ｂは、本開示の一実施形態による、ジオメトリのピースをそれらの対応するドローコールとは順不同でレンダリングすることを示す図である。特に、ジオメトリのピースのレンダリング順序は、対応するコマンドバッファ内の対応するドローコールの順序と一致しない場合がある。この例に示すように、オブジェクト０は、レンダリングコマンドバッファ内でオブジェクト１よりも優先される。しかしながら、オブジェクト０と１は、スクリーン領域Ｃ内などで交差する。その場合、領域Ｃではレンダリングの厳密な順序を守る必要があり得る。つまり、オブジェクト０は領域Ｃにおいてオブジェクト１の前にレンダリングする必要がある。 FIG. 15B is a diagram illustrating rendering pieces of geometry out of order with their corresponding draw calls, according to an embodiment of the present disclosure. In particular, the rendering order of pieces of geometry may not match the order of their corresponding draw calls in their corresponding command buffers. As shown in this example, object 0 has priority over object 1 in the rendering command buffer. However, objects 0 and 1 intersect, such as within screen area C. In that case, region C may need to adhere to a strict order of rendering. That is, object 0 needs to be rendered before object 1 in area C.

一方、領域Ａと領域Ｂのオブジェクトは、交差がないため、任意の順序でレンダリングできる。つまり、領域Ａ及び／または領域Ｂをレンダリングするときに、オブジェクト１がオブジェクト０に先行する場合もあれば、その逆の場合もある。 On the other hand, since objects in area A and area B do not intersect, they can be rendered in any order. That is, when rendering region A and/or region B, object 1 may precede object 0, and vice versa.

さらに別の実施形態では、レンダリングコマンドバッファを複数回トラバーサルできる場合、第１のトラバーサルで特定のスクリーン領域（例えば、高コスト領域）をレンダリングし、第２またはそれ以降のトラバーサルで残りの領域（例えば、低コスト領域）をレンダリングすることが可能である。結果として得られるジオメトリのピースのレンダリング順序は、第１のオブジェクトが第２のトラバーサルでレンダリングされる場合などは、対応するドローコールの順序と一致しない場合がある。ＧＰＵ間の負荷バランシングは、高コスト領域よりも低コスト領域の方が簡単であるため、この戦略により、対応する画像フレームをレンダリングする際の効率が向上する。 In yet another embodiment, if the rendering command buffer can be traversed multiple times, the first traversal renders a specific screen area (e.g., a high-cost area), and a second or subsequent traversal renders the remaining area (e.g. , low-cost region). The rendering order of the resulting pieces of geometry may not match the order of the corresponding draw calls, such as when a first object is rendered in a second traversal. Since load balancing between GPUs is easier in low-cost regions than in high-cost regions, this strategy increases efficiency in rendering the corresponding image frames.

図１６は、本開示の様々な実施形態の態様を実行するために使用することができる例示的なデバイス１６００のコンポーネントを示す。例えば、図１６は、本開示の実施形態による、レンダリング中にジオメトリ解析を実行して、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てることによる、及び／または、レンダリングの前にジオメトリ解析を実行して、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てることによる、及び／または、ジオメトリのピースを再分割し、結果として得られるジオメトリのより小さな部分を複数のＧＰＵに割り当てることによる、アプリケーションのためのジオメトリのマルチＧＰＵレンダリングに適した例示的なハードウェアシステムを示す。このブロック図は、パーソナルコンピュータ、サーバコンピュータ、ゲームコンソール、モバイルデバイス、または他のデジタルデバイスを組み込むことができる、またはそれらであってもよく、それらの各々が本発明の実施形態を実践するのに適している、デバイス１６００を示している。デバイス１６００は、ソフトウェアアプリケーション及び任意選択でオペレーティングシステムを実行するための、中央処理装置（ＣＰＵ）１６０２を含む。ＣＰＵ１６０２は、１つまたは複数の同種または異種の処理コアから構成されてもよい。 FIG. 16 illustrates components of an example device 1600 that can be used to perform aspects of various embodiments of this disclosure. For example, FIG. 16 shows that by performing geometry analysis during rendering and dynamically allocating screen space to the GPU for geometry rendering of image frames and/or prior to rendering, according to embodiments of the present disclosure. by dynamically allocating screen space to the GPU for geometry rendering of image frames, and/or by subdividing pieces of geometry to create smaller pieces of the resulting geometry. 1 illustrates an example hardware system suitable for multi-GPU rendering of geometry for applications by assigning it to multiple GPUs. This block diagram can or may incorporate a personal computer, server computer, game console, mobile device, or other digital device, each of which may implement embodiments of the present invention. A suitable device 1600 is shown. Device 1600 includes a central processing unit (CPU) 1602 for running software applications and optionally an operating system. CPU 1602 may be comprised of one or more homogeneous or dissimilar processing cores.

様々な実施形態によれば、ＣＰＵ１６０２は、１つまたは複数の処理コアを有する１つ以上の汎用マイクロプロセッサである。さらなる実施形態は、ゲーム実行中のグラフィック処理のために構成されたアプリケーションの、メディア及びインタラクティブエンターテインメントアプリケーションなどの、きわめて並列かつ計算集約的なアプリケーションに特に適合されたマイクロプロセッサアーキテクチャを有する１つまたは複数のＣＰＵを使用し、実装することができる。 According to various embodiments, CPU 1602 is one or more general purpose microprocessors with one or more processing cores. Further embodiments provide one or more microprocessor architectures specifically adapted for highly parallel and computationally intensive applications, such as media and interactive entertainment applications, applications configured for graphics processing during game execution. It can be implemented using several CPUs.

メモリ１６０４は、ＣＰＵ１６０２とＧＰＵ１６１６とが使用するアプリケーション及びデータを記憶する。ストレージ１６０６は、アプリケーション及びデータ用の不揮発性ストレージ及び他のコンピュータ可読媒体を提供し、かつ、固定ディスクドライブ、取り外し可能ディスクドライブ、フラッシュメモリデバイス、及びＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、Ｂｌｕ－ｒａｙ（登録商標）、ＨＤ－ＤＶＤ、または他の光学記憶デバイス、ならびに信号伝送及び記憶媒体を含み得る。ユーザ入力デバイス１６０８は、１人または複数のユーザからのユーザ入力をデバイス１６００に伝達するものであり、その例としては、キーボード、マウス、ジョイスティック、タッチパッド、タッチスクリーン、スチルまたはビデオレコーダ／カメラ、及び／またはマイクロフォンがあり得る。ネットワークインタフェース１６０９は、デバイス１６００が電子通信ネットワークを介して他のコンピュータシステムと通信することを可能にし、ローカルエリアネットワーク、及びインターネットなどのワイドエリアネットワークにわたる有線または無線通信を含み得る。オーディオプロセッサ１６１２は、ＣＰＵ１６０２、メモリ１６０４、及び／またはストレージ１６０６によって提供される命令及び／またはデータから、アナログまたはデジタルのオーディオ出力を生成するように適合されている。ＣＰＵ１６０２、ＧＰＵ１６１６を含むグラフィックサブシステム、メモリ１６０４、データストレージ１６０６、ユーザ入力デバイス１６０８、ネットワークインタフェース１６０９、及びオーディオプロセッサ１６１２を含むデバイス１６００のコンポーネントは、１つまたは複数のデータバス１６２２を介して接続されている。 Memory 1604 stores applications and data used by CPU 1602 and GPU 1616. Storage 1606 provides non-volatile storage and other computer-readable media for applications and data and includes fixed disk drives, removable disk drives, flash memory devices, and CD-ROMs, DVD-ROMs, Blu-rays ( trademark), HD-DVD, or other optical storage devices, as well as signal transmission and storage media. User input devices 1608 communicate user input to device 1600 from one or more users, such as a keyboard, mouse, joystick, touch pad, touch screen, still or video recorder/camera, etc. and/or a microphone. Network interface 1609 enables device 1600 to communicate with other computer systems via electronic communications networks, which may include wired or wireless communications across local area networks and wide area networks, such as the Internet. Audio processor 1612 is adapted to generate analog or digital audio output from instructions and/or data provided by CPU 1602, memory 1604, and/or storage 1606. Components of device 1600, including CPU 1602, graphics subsystem including GPU 1616, memory 1604, data storage 1606, user input device 1608, network interface 1609, and audio processor 1612, are connected via one or more data buses 1622. ing.

グラフィックサブシステム１６１４はさらに、データバス１６２２及びデバイス１６００のコンポーネントと接続される。グラフィックサブシステム１６１４は、少なくとも１つのグラフィック処理ユニット（ＧＰＵ）１６１６及びグラフィックメモリ１６１８を含む。グラフィックメモリ１６１８は、出力画像の各ピクセルのピクセルデータを格納するために使用される表示メモリ（例えばフレームバッファ）を含む。グラフィックメモリ１６１８は、ＧＰＵ１６１６と同一のデバイスに統合されてもよく、ＧＰＵ１６１６と別個のデバイスとして接続されてもよく、及び／またはメモリ１６０４内に実装されてもよい。ピクセルデータは、ＣＰＵ１６０２から直接グラフィックメモリ１６１８に提供することができる。あるいは、ＣＰＵ１６０２は、所望の出力画像を定義するデータ及び／または命令をＧＰＵ１６１６に提供し、ＧＰＵ１６１６は、そこから、１つまたは複数の出力画像のピクセルデータを生成する。所望の出力画像を定義するデータ及び／または命令は、メモリ１６０４及び／またはグラフィックメモリ１６１８に記憶することができる。一実施形態では、ＧＰＵ１６１６は、シーンのジオメトリ、ライティング、陰影、質感、モーション、及び／またはカメラのパラメータを定義する命令及びデータから、出力画像のピクセルデータを生成する３Ｄレンダリング機能を含む。ＧＰＵ１６１６はさらに、シェーダプログラムを実行することができる１つまたは複数のプログラム可能実行ユニットを含み得る。 Graphics subsystem 1614 is further connected to data bus 1622 and components of device 1600. Graphics subsystem 1614 includes at least one graphics processing unit (GPU) 1616 and graphics memory 1618. Graphics memory 1618 includes display memory (eg, a frame buffer) used to store pixel data for each pixel of the output image. Graphics memory 1618 may be integrated into the same device as GPU 1616, may be connected as a separate device from GPU 1616, and/or may be implemented within memory 1604. Pixel data may be provided directly from CPU 1602 to graphics memory 1618. Alternatively, CPU 1602 provides data and/or instructions defining a desired output image to GPU 1616 from which GPU 1616 generates pixel data for one or more output images. Data and/or instructions defining a desired output image may be stored in memory 1604 and/or graphics memory 1618. In one embodiment, GPU 1616 includes 3D rendering functionality that generates pixel data for the output image from instructions and data that define scene geometry, lighting, shading, texture, motion, and/or camera parameters. GPU 1616 may further include one or more programmable execution units that can execute shader programs.

グラフィックサブシステム１６１４は、グラフィックメモリ１６１８から画像のピクセルデータを定期的に出力して、ディスプレイデバイス１６１０に表示させる、または投影システム（図示せず）により投影させる。ディスプレイデバイス１６１０は、ＣＲＴ、ＬＣＤ、プラズマ、及びＯＬＥＤディスプレイを含む、デバイス１６００からの信号に応答して、視覚情報を表示することが可能な任意のデバイスであってもよい。デバイス１６００は、ディスプレイデバイス１６１０に、例えば、アナログ信号またはデジタル信号を提供することができる。 Graphics subsystem 1614 periodically outputs image pixel data from graphics memory 1618 for display on display device 1610 or for projection by a projection system (not shown). Display device 1610 may be any device capable of displaying visual information in response to signals from device 1600, including CRT, LCD, plasma, and OLED displays. Device 1600 can provide, for example, an analog signal or a digital signal to display device 1610.

グラフィックサブシステム１６１４を最適化するための他の実施形態は、画像フレームのオブジェクトをレンダリングする前に、インターリーブされたスクリーン領域に対してジオメトリを事前テストすることによる、アプリケーションのジオメトリのマルチＧＰＵレンダリングを含むことができる。グラフィックサブシステム１６１４は、１つまたは複数の処理デバイスとして構成することができる。 Other embodiments for optimizing the graphics subsystem 1614 include multi-GPU rendering of an application's geometry by pre-testing the geometry against interleaved screen regions before rendering objects in image frames. can be included. Graphics subsystem 1614 may be configured as one or more processing devices.

例えば、グラフィックサブシステム１６１４は、一実施形態では、レンダリング中の領域テストによってアプリケーションのジオメトリのマルチＧＰＵレンダリングを実行するように構成され得、複数のグラフィックサブシステムが、単一のゲームのためのグラフィック及び／またはレンダリングパイプラインを実装し得る。すなわち、グラフィックサブシステム１６１４は、アプリケーションを実行するときに、画像、または一連の画像の１つまたは複数の画像のそれぞれをレンダリングするために使用される複数のＧＰＵを含む。 For example, graphics subsystem 1614 may be configured, in one embodiment, to perform multi-GPU rendering of an application's geometry with area testing during rendering, and multiple graphics subsystems may perform graphics rendering for a single game. and/or may implement a rendering pipeline. That is, graphics subsystem 1614 includes multiple GPUs that are used to render each of one or more images of an image, or series of images, when executing an application.

他の実施形態では、グラフィックサブシステム１６１４は、対応するＣＰＵ上で実行している単一のアプリケーションのグラフィック処理を実行するために組み合わされる複数のＧＰＵデバイスを含む。例えば、複数のＧＰＵは、画像のオブジェクトのレンダリングの間に、領域テストにより、アプリケーションのジオメトリのマルチＧＰＵレンダリングを実行できる。他の例では、複数のＧＰＵが、フレームレンダリングの代替形式を実行でき、この場合、連続したフレーム期間で、ＧＰＵ１は第１のフレームをレンダリングし、ＧＰＵ２は第２のフレームをレンダリングするなどして、最後のＧＰＵに到達すると、最初のＧＰＵが次のビデオフレームをレンダリングする（例えば、ＧＰＵが２つしかない場合、ＧＰＵ１は第３のフレームをレンダリングする）。つまり、フレームをレンダリングするときにＧＰＵが循環する。レンダリング操作はオーバーラップする可能性があり、それにおいて、ＧＰＵ１が最初のフレームのレンダリングを終了する前にＧＰＵ２が２番目のフレームのレンダリングを開始できる。別の実施態様では、複数のＧＰＵデバイスに、レンダリング及び／またはグラフィックパイプラインで異なるシェーダ操作を割り当てることができる。マスターＧＰＵがメインのレンダリングと合成を実行している。例えば、３つのＧＰＵを含むグループでは、マスターＧＰＵ１がメインレンダリング（例えば、第１のシェーダ操作）及び、スレーブＧＰＵ２とスレーブＧＰＵ３からの出力の合成を実行でき、スレーブＧＰＵ２は第２のシェーダ（例えば、川などの流体効果）操作を実行でき、スレーブＧＰＵ３は第３のシェーダ（例えば、粒子の煙）操作を実行でき、マスターＧＰＵ１は、ＧＰＵ１、ＧＰＵ２、及びＧＰＵ３のそれぞれからの結果を合成する。このようにして、様々なＧＰＵを割り当てて、様々なシェーダ操作（旗振り、風、煙の発生、炎など）を実行してビデオフレームをレンダリングできる。さらに別の実施形態では、３つのＧＰＵのそれぞれを、ビデオフレームに対応するシーンの異なるオブジェクト及び／または部分に割り当てることができる。上記の実施形態及び実施態様では、これらの操作は、同じフレーム周期で（同時に並行して）、または異なるフレーム周期で（順次並列に）実行することができる。 In other embodiments, graphics subsystem 1614 includes multiple GPU devices that combine to perform graphics processing for a single application running on corresponding CPUs. For example, multiple GPUs can perform multi-GPU rendering of an application's geometry with area testing during rendering of objects in an image. In other examples, multiple GPUs may perform alternative forms of frame rendering, where in consecutive frame periods, GPU1 renders a first frame, GPU2 renders a second frame, and so on. , when the last GPU is reached, the first GPU renders the next video frame (e.g., if there are only two GPUs, GPU1 renders the third frame). That is, the GPU cycles when rendering frames. Rendering operations can overlap, in which GPU2 can start rendering a second frame before GPU1 finishes rendering the first frame. In another implementation, multiple GPU devices may be assigned different shader operations in the rendering and/or graphics pipeline. The master GPU is performing the main rendering and compositing. For example, in a group containing three GPUs, master GPU1 can perform main rendering (e.g., a first shader operation) and compositing of outputs from slave GPU2 and slave GPU3, while slave GPU2 can perform main rendering (e.g., a first shader operation), and slave GPU2 can perform main rendering (e.g., a first shader operation) and compositing of outputs from slave GPU2 and slave GPU3, The slave GPU3 can perform a third shader (e.g., particle smoke) operation, and the master GPU1 combines the results from each of GPU1, GPU2, and GPU3. In this way, different GPUs can be assigned to perform different shader operations (flag waving, wind, smoke production, flames, etc.) to render video frames. In yet another embodiment, each of the three GPUs may be assigned to a different object and/or portion of the scene corresponding to a video frame. In the embodiments and implementations described above, these operations can be performed in the same frame period (simultaneously in parallel) or in different frame periods (sequentially in parallel).

したがって、本開示は、レンダリング中にジオメトリ解析を実行して、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てることによる、及び／または、レンダリングの前にジオメトリ解析を実行して、画像フレームのジオメトリレンダリングのためにスクリーン領域をＧＰＵに動的に割り当てることによる、及び／または、ジオメトリのピースを再分割し、結果として得られるジオメトリのより小さな部分を複数のＧＰＵに割り当てることによる、アプリケーションのためのジオメトリのマルチＧＰＵレンダリングのために構成された方法及びシステムを説明する。 Accordingly, the present disclosure provides techniques for performing geometry analysis during rendering and dynamically allocating screen space to the GPU for geometry rendering of image frames, and/or performing geometry analysis prior to rendering. , by dynamically allocating screen space to GPUs for geometry rendering of image frames, and/or by subdividing pieces of geometry and assigning smaller parts of the resulting geometry to multiple GPUs. , describes a method and system configured for multi-GPU rendering of geometry for applications.

本明細書で定義される様々な実施形態は、本明細書で開示される様々な特徴を使用する特定の実施態様に組み合わされ得る、または組み立てられ得ることを、理解されたい。従って、提供される例は、可能な例の一部にすぎず、様々な要素を組み合わせることでより多くの実施態様を規定することが可能な様々な実施態様に制限を加えるものではない。ある例では、ある実施態様は、開示されたまたは同等の実施態様の趣旨から逸脱することなく、より少ない要素を含んでもよい。 It is to be understood that the various embodiments defined herein can be combined or assembled into specific implementations using the various features disclosed herein. Accordingly, the examples provided are only some of the possible examples and are not intended to be limiting to the various embodiments, which can define many more embodiments by combining various elements. In certain examples, certain embodiments may include fewer elements without departing from the spirit of the disclosed or equivalent embodiments.

本開示の実施形態は、ハンドヘルドデバイス、マイクロプロセッサシステム、マイクロプロセッサベースもしくはプログラム可能な消費者向け電気製品、ミニコンピュータ、及びメインフレームコンピュータなどを含む様々なコンピュータシステム構成で実施されてよい。本開示の実施形態はまた、有線ベースネットワークまたは無線ネットワークを介してリンクされる遠隔処理デバイスによりタスクが行われる分散コンピューティング環境においても、実施することができる。 Embodiments of the present disclosure may be implemented in a variety of computer system configurations, including handheld devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. Embodiments of the present disclosure can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through wire-based or wireless networks.

上記の実施形態を念頭に置いて、本開示の実施形態がコンピュータシステムに格納されたデータを含む様々なコンピュータ実装の動作を使用し得ることを理解されたい。これらの動作は、物理量の物理的操作を必要とする動作である。本開示の実施形態の一部を形成する、本明細書で説明される動作のうちのいずれも、有用な機械動作である。開示の実施形態はまた、これら動作を実行するためのデバイスまたは装置に関する。装置は、必要な目的のために特別に構築することができる。または、装置は、コンピュータに記憶されたコンピュータプログラムにより選択的に起動または構成される汎用コンピュータであってもよい。具体的には、本明細書の教示に従って書かれたコンピュータプログラムとともに様々な汎用マシンを使用することができる、あるいは、必要な動作を実行するためにさらに特化した装置を構築するほうがより好都合である場合もある。 With the above embodiments in mind, it should be understood that embodiments of the present disclosure may employ various computer-implemented operations involving data stored on computer systems. These operations are those requiring physical manipulations of physical quantities. Any of the operations described herein that form part of an embodiment of the present disclosure are useful mechanical operations. The disclosed embodiments also relate to devices or apparatus for performing these operations. The device can be specially constructed for the required purpose. Alternatively, the device may be a general purpose computer that is selectively activated or configured by a computer program stored on the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized equipment to perform the required operations. In some cases.

本開示はまた、コンピュータ可読媒体上のコンピュータ可読コードとしても具現化することができる。コンピュータ可読媒体は、後でコンピュータシステムにより読み出され得るデータを格納できる任意のデータストレージデバイスである。コンピュータ可読媒体の例は、ハードドライブ、ネットクワーク接続ストレージ（ＮＡＳ）、読み出し専用メモリ、ランダムアクセスメモリ、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＣＤ－ＲＷ、磁気テープ、並びに他の光学及び非光学データストレージデバイスを含む。コンピュータ可読媒体には、コンピュータ可読コードが分散方式で記憶され実行されるように、ネットワーク接続されたコンピュータシステムにわたり分散されたコンピュータ可読有形媒体が含まれ得る。 The present disclosure may also be embodied as computer readable code on a computer readable medium. A computer-readable medium is any data storage device that can store data that can later be read by a computer system. Examples of computer readable media are hard drives, network attached storage (NAS), read only memory, random access memory, CD-ROM, CD-R, CD-RW, magnetic tape, and other optical and non-optical data storage. Including devices. Computer-readable media can include computer-readable tangible media that is distributed over a network-coupled computer system so that computer-readable code is stored and executed in a distributed manner.

方法動作は特定の順序で説明されたが、オーバーレイ動作の処理が所望の方法で実行される限り、動作間に他の維持管理動作が実行されてもよく、または動作がわずかに異なる時間に起こるように調整されてもよく、またはシステム内に動作を分散することで、処理に関連する様々な間隔で処理動作が起こることを可能にしてもよいことを、理解すべきである。 Although the method operations have been described in a particular order, other maintenance operations may be performed between the operations, as long as the processing of the overlay operations is performed in the desired manner, or the operations occur at slightly different times. It should be appreciated that the processing operations may be coordinated to occur or may be distributed within the system to allow processing operations to occur at various processing-related intervals.

前述の開示は、理解を明確にするためにある程度詳細に説明されたが、添付の特許請求の範囲内で特定の変更及び修正を実施できることは明らかであろう。したがって、本実施形態は、限定ではなく例示としてみなされるべきであり、本開示の実施形態は、本明細書に提供される詳細に限定されるものではなく、添付の特許請求の範囲内及び均等物内で変更されてよい。 Although the foregoing disclosure has been described in some detail for clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be regarded as illustrative rather than limiting, and embodiments of the present disclosure are not limited to the details provided herein, but rather within the scope and equivalents of the appended claims. May be changed within the product.

Claims

A method for graphic processing, the method comprising:
Render graphics for applications using multiple graphics processing units (GPUs),
dividing the responsiveness of processing multiple pieces of geometry of an image frame during an analysis pre-pass phase between said plurality of GPUs, each of said plurality of geometry pieces being responsible for performing said processing during said analysis pre-pass phase; is uniquely assigned to the corresponding GPU for
determining an overlap of each of the plurality of pieces of geometry with each of a plurality of screen regions in the analysis pre-pass phase , the analysis pre-pass phase being performed during a frame period ;
information about the plurality of geometry pieces and their relationships to the plurality of screen regions based on the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions; and the information is executed during the frame period;
dynamically allocating the plurality of screen areas to the plurality of GPUs based on the information for rendering the plurality of geometry pieces during a subsequent phase of rendering performed during the frame period ; Method.

2. The method of claim 1 , wherein the analysis pre-pass phase is performed using a vertex shader or a compute shader.

In determining the overlap,
2. The method of claim 1 , wherein the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions is estimated.

In said estimation of said overlap:
4. The method of claim 3 , determining an overlap of one or more bounding boxes of one or more primitives of a piece of geometry with each of the plurality of screen regions.

5. The method of claim 4 , wherein one or more screen areas with no overlap are excluded.

Further, during a subsequent phase of the rendering, the plurality of GPUs may be configured to 2. The method of claim 1 , wherein multiple pieces of geometry are rendered.

Furthermore, it determines the GPU usage when rendering the previous image frame,
2. The method of claim 1 , allocating the plurality of screen areas to the plurality of GPUs based on the information and usage of the GPU when performing the rendering of the previous image frame.

the pieces of geometry correspond to geometry used or generated by a draw call, or
The geometry used or generated by the draw call is subdivided into smaller pieces of geometry corresponding to the plurality of pieces of geometry, such that the information is generated for the smaller pieces of geometry. The method described in Section 1 .

2. The method of claim 1 , wherein the information includes an exact or approximate area that a primitive of a piece of geometry occupies in a corresponding region.

The information includes the number of shaded pixels per screen area, or
The method of claim 1 , wherein the information includes a number of vertices per screen area.

2. The method of claim 1 , wherein corresponding information may or may not be generated depending on one or more properties of the corresponding pieces of geometry.

further determining a plurality of costs for rendering the plurality of geometry pieces during a subsequent phase of the rendering;
2. The method of claim 1 , wherein the plurality of costs are considered when performing dynamic allocation of the plurality of screen areas to the plurality of GPUs .

the information is generated by one or more shaders;
2. The method of claim 1 , wherein the one or more shaders use at least one dedicated instruction to accelerate generation of the information.

the information is generated by one or more shaders;
2. The method of claim 1 , wherein the one or more shaders do not perform position or parameter cache allocation.

the information is generated by one or more shaders;
The method of claim 1 , wherein the one or more shaders are configurable to output the information or output vertex position and parameter information for use in subsequent phases of the rendering. .

2. The method of claim 1 , wherein at least one of the plurality of GPUs is assigned to a screen area before or during a subsequent phase of rendering.

2. The method of claim 1 , wherein screen area initially allocated to a first GPU is reallocated to a second GPU during a subsequent phase of the rendering.

2. The method of claim 1 , wherein screen area is allocated to two or more of the plurality of GPUs.

2. The method of claim 1 , wherein the rendering order of the plurality of geometry pieces does not match the order of corresponding draw calls in a rendering command buffer.

The rendering command buffer is shared among the plurality of GPUs as a common rendering command buffer,
20. The method of claim 19 , wherein the common rendering command buffer format allows commands to be executed by only a subset of the plurality of GPUs.

Said information allows for relaxation of rendering phase dependencies such that a first GPU proceeds to a subsequent phase of said rendering while a second GPU is still processing said analysis pre-pass phase. 2. The method of claim 1 , wherein:

2. The method of claim 1 , wherein the information is used to schedule the transfer of screen area Z-buffer or render target data from a second GPU to a first GPU.

2. The method of claim 1 , wherein one or more of the plurality of GPUs are part of a larger GPU configured as a plurality of virtual GPUs.

A computer system,
a processor;
a memory coupled to the processor and storing instructions that, when executed by the computer system, cause the computer system to perform a method for graphics processing, the method comprising:
Render graphics for applications using multiple graphics processing units (GPUs),
splitting the responsiveness of processing multiple geometry pieces of an image frame during an analysis pre-pass phase of rendering among the plurality of GPUs, each of the plurality of geometry pieces being processed during the analysis pre-pass phase of rendering; uniquely assigned to the corresponding GPU for execution ,
determining an overlap of each of the plurality of pieces of geometry with each of a plurality of screen regions in the analysis pre-pass phase , the analysis pre-pass phase being performed during a frame period ;
information about the plurality of geometry pieces and their relationships to the plurality of screen regions based on the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions; and the information is executed during the frame period;
dynamically allocating the plurality of screen areas to the plurality of GPUs based on the information for rendering the plurality of geometry pieces during a subsequent phase of rendering performed during the frame period ; computer system.

25. The computer system of claim 24 , in the method, the analysis pre-pass phase is performed using a vertex shader or a compute shader.

In the method, determining the overlap includes:
25. The computer system of claim 24 , estimating the overlap of each of the plurality of geometry pieces with each of the plurality of screen regions.