CN115298686B

CN115298686B - System and method for efficient multi-GPU rendering of geometry by pre-testing for interlaced screen regions prior to rendering

Info

Publication number: CN115298686B
Application number: CN202180023019.6A
Authority: CN
Inventors: M.E.塞尔尼; F.斯特劳斯; T.伯格霍夫
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2020-02-03
Filing date: 2021-02-01
Publication date: 2023-10-17
Anticipated expiration: 2041-02-01
Also published as: WO2021158483A8; JP2023144060A; JP2023505607A; JP7481556B2; WO2021158483A1; EP4100922A1; JP2024091921A; CN115298686A; JP7564399B2; JP7334358B2

Abstract

A method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on the interleaved plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes assigning geometries of image frames generated by the application to the GPU for geometry pre-testing. The method includes performing a geometry pre-test at the GPU to generate information about the geometry and its relationship to each of the plurality of screen regions. The method includes using the information at each of the plurality of GPUs in rendering the image frames.

Description

System and method for efficient multi-GPU rendering of geometry by pre-testing for interlaced screen regions prior to rendering

Technical Field

The present disclosure relates to graphics processing, and more particularly to multi-GPU collaboration in rendering images for applications.

Background

In recent years, online services have been increasingly pushed, allowing online games or cloud games to be played in a streaming format between a cloud game server and clients connected through a network. Streaming formats are becoming increasingly popular due to the ability to provide game names on demand, the ability to execute more complex games, the ability to network between players to play multiplayer games, asset sharing between players, instant experience sharing between players and/or spectators, allowing friends to watch friends for video games, letting friends join a friend's ongoing game play, etc.

The cloud game server may be configured to provide resources to one or more clients and/or applications. That is, the cloud game server may be configured with resources capable of high throughput. For example, the performance that a single Graphics Processing Unit (GPU) can achieve is limited. To render more complex scenes or use more complex algorithms (e.g., materials, lighting, etc.) when generating scenes, it may be necessary to use multiple GPUs to render a single image. However, equal use of these graphics processing units is difficult to achieve. Furthermore, even if multiple GPUs process images for an application using conventional techniques, the corresponding increased screen pixel count and geometry density cannot be supported (e.g., four GPUs cannot write four times the pixels and/or process four times the vertices or primitives of the image).

It is against this background that embodiments of the present disclosure have emerged.

Disclosure of Invention

Embodiments of the present disclosure relate to rendering a single image using multiple GPUs in concert, such as multi-GPU rendering of geometry for an application by pre-testing for potentially interlaced screen regions prior to rendering.

Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The screen areas are staggered. The method includes assigning a plurality of geometries of the image frames to a plurality of GPUs for geometry testing. The method includes assigning geometries of image frames generated by the application to the GPU for geometry testing. The method includes performing a geometry test at the GPU to generate information about the geometry and its relationship to each of the plurality of screen regions. The method includes rendering the geometry using the information at each of the plurality of GPUs, wherein using the information may include, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen region allocated to a given GPU.

In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer readable medium includes program instructions for rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The computer readable medium includes program instructions for dividing responsibility for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding responsibility division known to the plurality of GPUs, wherein screen regions of the plurality of screen regions are staggered. The computer readable medium includes program instructions for assigning geometries of image frames generated by an application to a GPU for geometry pre-testing. The computer readable medium includes program instructions for performing a geometry pre-test at the GPU to generate information about the geometry and its relationship to each of the plurality of screen regions. The computer readable medium includes program instructions for using the information at each of the plurality of GPUs in rendering the image frames.

In another embodiment, a computer system is disclosed that includes a processor and a memory coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs, wherein screen regions of the plurality of screen regions are staggered. The method includes assigning geometries of image frames generated by the application to the GPU for geometry pre-testing. The method includes performing a geometry pre-test at the GPU to generate information about the geometry and its relationship to each of the plurality of screen regions. The method includes using the information at each of the plurality of GPUs in rendering the image frames.

Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes performing, at a pre-test GPU, a geometry test on a plurality of geometries of an image frame generated by an application to generate information about each geometry and its relationship to each of a plurality of screen regions. The method includes rendering, at each of the plurality of GPUs, the plurality of geometry using information generated for each of the plurality of geometry, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to the given GPU.

In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer readable medium includes program instructions for rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The computer readable medium includes program instructions for dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding responsibilities division known to the plurality of GPUs. The computer readable medium includes program instructions for performing, at a pre-test GPU, a geometry test on a plurality of geometries of an image frame generated by an application to generate information about each geometry and its relationship to each of a plurality of screen regions. The computer readable medium includes program instructions for rendering the plurality of geometry using information generated for each of the plurality of geometry at each of the plurality of GPUs, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to a given GPU.

In another embodiment, a computer system is disclosed that includes a processor and a memory coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes performing, at a pre-test GPU, a geometry test on a plurality of geometries of an image frame generated by an application to generate information about each geometry and its relationship to each of a plurality of screen regions. The method includes rendering, at each of the plurality of GPUs, the plurality of geometry using information generated for each of the plurality of geometry, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to the given GPU.

Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes rendering a first plurality of geometric figures at a plurality of GPUs during a rendering phase of a previous image frame generated by an application. The method includes generating statistics for rendering of a previous image frame. The method includes assigning a second plurality of geometries of a current image frame generated by the application to the plurality of GPUs for geometry testing based on the statistics. The method includes performing a geometry test on the second plurality of geometries on the current image frame to generate information about each of the second plurality of geometries and its relationship to each of the plurality of screen regions, wherein the geometry test is performed at each of the plurality of GPUs based on the allocation. The method includes rendering, at each of the plurality of GPUs, a second plurality of geometry using information generated for each of the second plurality of geometry, wherein using the information may include, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen region allocated to a given GPU.

In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer readable medium includes program instructions for rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The computer readable medium includes program instructions for dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding responsibilities division known to the plurality of GPUs. The computer readable medium includes program instructions for rendering a first plurality of geometric figures at a plurality of GPUs during a rendering phase of a previous image frame generated by an application. The computer readable medium includes program instructions for generating statistics for rendering a previous image frame. The computer readable medium includes program instructions for assigning a second plurality of geometries of a current image frame generated by the application to the plurality of GPUs for geometry testing based on the statistics. The computer readable medium includes program instructions for performing a geometry test on the second plurality of geometries on the current image frame to generate information about each of the second plurality of geometries and its relationship to each of the plurality of screen regions, wherein the geometry test is performed at each of the plurality of GPUs based on the assignment. The computer readable medium includes program instructions for rendering, at each of the plurality of GPUs, the second plurality of geometries using information generated for each of the second plurality of geometries, wherein using the information may include, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to a given GPU.

In another embodiment, a computer system is disclosed that includes a processor and a memory coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes rendering a first plurality of geometric figures at a plurality of GPUs during a rendering phase of a previous image frame generated by an application. The method includes generating statistics for rendering of a previous image frame. The method includes assigning a second plurality of geometries of a current image frame generated by the application to the plurality of GPUs for geometry testing based on the statistics. The method includes performing a geometry test on the second plurality of geometries on the current image frame to generate information about each of the second plurality of geometries and its relationship to each of the plurality of screen regions, wherein the geometry test is performed at each of the plurality of GPUs based on the allocation. The method includes rendering, at each of the plurality of GPUs, a second plurality of geometry using information generated for each of the second plurality of geometry, wherein using the information may include, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen region allocated to a given GPU.

Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes assigning a plurality of geometries of the image frames to a plurality of GPUs for geometry testing. The method includes setting a first state to configure one or more shaders to perform geometry testing. The method includes performing, at a plurality of GPUs, geometry testing on a plurality of geometries to generate information about each geometry and its relationship to each of a plurality of screen regions. The method includes setting a second state to configure one or more shaders to perform rendering. The method includes rendering, at each of the plurality of GPUs, the plurality of geometry using information generated for each of the plurality of geometry, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to the given GPU.

In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer readable medium includes program instructions for rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The computer readable medium includes program instructions for dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding responsibilities division known to the plurality of GPUs. The computer readable medium includes program instructions for assigning a plurality of geometries of an image frame to a plurality of GPUs for geometry testing. The computer readable medium includes program instructions for setting a first state to configure one or more shaders to perform geometry testing. The computer readable medium includes program instructions for performing geometry testing on a plurality of geometries at a plurality of GPUs to generate information about each geometry and its relationship to each of a plurality of screen regions. The computer readable medium includes program instructions for setting a second state to configure one or more shaders to perform rendering. The computer readable medium includes program instructions for rendering the plurality of geometry using information generated for each of the plurality of geometry at each of the plurality of GPUs, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to a given GPU.

In another embodiment, a computer system is disclosed that includes a processor and a memory coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes assigning a plurality of geometries of the image frames to a plurality of GPUs for geometry testing. The method includes setting a first state to configure one or more shaders to perform geometry testing. The method includes performing, at a plurality of GPUs, geometry testing on a plurality of geometries to generate information about each geometry and its relationship to each of a plurality of screen regions. The method includes setting a second state to configure one or more shaders to perform rendering. The method includes rendering, at each of the plurality of GPUs, the plurality of geometry using information generated for each of the plurality of geometry, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen area allocated to the given GPU.

Embodiments of the present disclosure disclose a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes assigning a plurality of geometries of the image frames to a plurality of GPUs for geometry testing. The method includes interleaving a first set of shaders performing geometry testing and rendering on a first set of geometries with a second set of shaders performing geometry testing and rendering on a second set of geometries. The geometry test generates corresponding information about each geometry in the first or second set and its relationship to each of the plurality of screen regions. The plurality of GPUs render each geometry in the first or second group using corresponding information, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen region allocated to a given GPU.

In another embodiment, a non-transitory computer-readable medium for performing a method is disclosed. The computer readable medium includes program instructions for rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The computer readable medium includes program instructions for dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding responsibilities division known to the plurality of GPUs. The computer readable medium includes program instructions for assigning a plurality of geometries of an image frame to a plurality of GPUs for geometry testing. The computer readable medium includes program instructions for interleaving a first set of shaders performing geometry testing and rendering on a first set of geometries with a second set of shaders performing geometry testing and rendering on a second set of geometries. The geometry test generates corresponding information about each geometry in the first or second set and its relationship to each of the plurality of screen regions. The plurality of GPUs render each geometry in the first or second group using corresponding information, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen region allocated to a given GPU.

In another embodiment, a computer system is disclosed that includes a processor and a memory coupled to the processor and having instructions stored therein that, if executed by the computer system, cause the computer system to perform a method for graphics processing. The method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs). The method includes dividing responsibilities for rendering geometry of graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of responsibilities known to the plurality of GPUs. The method includes assigning a plurality of geometries of the image frames to a plurality of GPUs for geometry testing. The method includes interleaving a first set of shaders performing geometry testing and rendering on a first set of geometries with a second set of shaders performing geometry testing and rendering on a second set of geometries. The geometry test generates corresponding information about each geometry in the first or second set and its relationship to each of the plurality of screen regions. The plurality of GPUs render each geometry in the first or second group using corresponding information, wherein using the information includes, for example, skipping rendering entirely if it has been determined that the geometry does not overlap any screen region allocated to a given GPU.

Other aspects of the present disclosure will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the disclosure.

Drawings

The disclosure may be best understood by reference to the following description taken in conjunction with the accompanying drawings in which:

fig. 1 is a schematic diagram of a system for providing games over a network between one or more cloud game servers configured to enable multiple GPUs to cooperate to render a single image, including multiple GPU (graphics processing unit) rendering of geometry for an application by pre-testing the geometry for screen regions that may be interlaced, according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a multi-GPU architecture in which multiple GPUs cooperate to render a single image, according to one embodiment of the present disclosure.

FIG. 3 is a schematic diagram of a plurality of graphics processing unit resources configured for multiple GPU rendering of geometry for an application by pre-testing the geometry for a screen region that may be interlaced, according to one embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a rendering architecture of a graphics pipeline configured for multiple GPU processing such that multiple GPUs cooperate to render a single image, according to an implementation of the present disclosure.

Fig. 5 is a flowchart illustrating a method for graphics processing including multi-GPU rendering of geometry for an application by pre-testing for interlaced screen regions prior to rendering, according to one embodiment of the present disclosure.

Fig. 6A is a schematic diagram of a screen subdivided into quadrants when performing multiple GPU rendering according to one embodiment of the present disclosure.

Fig. 6B is a schematic diagram of a screen subdivided into a plurality of interleaved regions when performing multiple GPU rendering according to one embodiment of the present disclosure.

Fig. 7A is a schematic diagram of a rendering command buffer shared by multiple GPUs that cooperate to render a single image frame, rendering a pre-test and rendering portion that includes a geometry portion, according to one embodiment of the present disclosure.

Fig. 7B-1 illustrates an image including four objects rendered by multiple GPUs, and shows screen area responsibilities of each GPU when rendering objects of the image, according to one embodiment of the present disclosure.

Fig. 7B-2 is a table illustrating rendering performed by each GPU when rendering the four objects of fig. 7B-1, according to one embodiment of the present disclosure.

FIG. 7C is a schematic diagram illustrating execution of geometry pre-testing and geometry rendering by one or more GPUs when rendering an image frame (e.g., the image of FIG. 7B-1) by cooperation of the plurality of GPUs, according to one embodiment of the disclosure.

Fig. 8A illustrates object testing of screen regions when multiple GPUs cooperatively render a single image, according to one embodiment of the present disclosure.

Fig. 8B illustrates testing of portions of objects of a screen region when multiple GPUs cooperatively render a single image, according to one embodiment of the present disclosure.

Fig. 9A-9C illustrate various policies for assigning screen regions to corresponding GPUs when multiple GPUs cooperatively render a single image, according to one embodiment of the disclosure.

Fig. 10 is a schematic diagram illustrating various distributions of GPU assignments for performing geometry pre-testing on multiple geometries, according to an embodiment of the present disclosure.

Fig. 11A is a schematic diagram illustrating geometric pre-testing and rendering of a previous image frame by multiple GPUs, and using statistics collected during rendering in a current image frame to affect the allocation of pre-testing of the geometry of the current image frame to the multiple GPUs, according to one embodiment of the present disclosure.

Fig. 11B is a flowchart illustrating a graphics processing method according to one embodiment of the present disclosure, including geometric pre-testing and rendering of a previous image frame by a plurality of GPUs, and using statistical data collected during rendering in a current image frame to affect the pre-testing of the geometric shapes of the current image frame to be distributed to the plurality of GPUs.

Fig. 12A is a schematic diagram illustrating the use of a shader configured to perform both pre-testing and rendering of geometry of an image frame in two passes through a portion of a command buffer, according to one embodiment of the present disclosure.

Fig. 12B is a flow chart illustrating a graphics processing method according to one embodiment of the present disclosure, including performing both pre-testing and rendering of geometry of an image frame by a portion of a command buffer using the same set of shaders in two passes.

Fig. 13A is a schematic diagram illustrating the use of shaders configured to perform both geometry testing and rendering, where the geometry testing and rendering performed for different sets of geometries are interleaved using separate portions of corresponding command buffers, according to one embodiment of the present disclosure.

Fig. 13B is a flow chart illustrating a graphics processing method according to one embodiment of the present disclosure, including interleaving pre-testing and rendering of the geometry of an image frame for different sets of geometries using separate portions of corresponding command buffers.

FIG. 14 illustrates components of an example apparatus that may be used to perform aspects of various embodiments of the present disclosure.

Detailed Description

Although the following detailed description contains many specific details for the purposes of illustration, persons of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the disclosure. Accordingly, the various aspects of the disclosure described below are set forth without any loss of generality to, and without imposing limitations upon, the claims appended hereto.

In general, the performance that a single GPU can achieve is limited, e.g., from how much limit the GPU can achieve. In embodiments of the present disclosure, it is desirable to use multiple GPUs to render a single image in order to render more complex scenes or to use more complex algorithms (e.g., materials, lighting, etc.). In particular, various embodiments of the present disclosure describe methods and systems configured for performing multiple GPU rendering of geometry for an application by pre-testing the geometry for screen regions that may be interlaced. The multiple GPUs cooperate to generate an image. Rendering responsibilities are divided among the multiple GPUs based on screen regions. Prior to rendering the geometry, the GPU may generate information about the geometry and its relationship to the screen area. This allows the GPU to render geometry more efficiently or avoid rendering it altogether. As one advantage, for example, this allows multiple GPUs to render more complex scenes and/or images in the same amount of time.

With the above general understanding of the various embodiments, example details of the embodiments will now be described with reference to various figures.

Throughout this specification, references to "application" or "game" or "video game" or "game application" are intended to represent any type of interactive application that is guided by execution of input commands. For illustrative purposes only, interactive applications include applications for gaming, word processing, video game processing, and the like. Furthermore, the terms introduced above are interchangeable.

Throughout this specification, various embodiments of the disclosure are described for multi-GPU processing or rendering of geometry for an application using an exemplary architecture with four GPUs. However, it should be appreciated that any number of GPUs (e.g., two or more GPUs) may cooperate in rendering geometry for an application.

Fig. 1 is a schematic diagram of a system for performing multiple GPU processing when rendering an image (e.g., an image frame) for an application, according to one embodiment of the present disclosure. According to embodiments of the present disclosure, the system is configured to provide games between one or more cloud game servers over a network, and more particularly to the collaboration of multiple GPUs to render a single image of an application. Cloud gaming involves executing a video game on a server to generate game-rendered video frames, which are then sent to a client for display. In particular, the system 100 is configured for efficient multi-GPU rendering of geometry for applications by pre-testing for potentially interlaced screen regions prior to rendering.

While fig. 1 illustrates multiple GPU rendering of geometry implemented between one or more cloud game servers of a cloud game system, other embodiments of the present disclosure provide efficient multiple GPU rendering of geometry for applications by performing region testing within a separate system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs) at the time of rendering.

It should also be appreciated that in various embodiments (e.g., in a cloud gaming environment or in a stand-alone system), multiple GPU rendering of geometry may be performed using a physical GPU or a virtual GPU, or a combination of both. For example, a virtual machine (e.g., an instance) may be created using a hypervisor that utilizes host hardware (e.g., located at a data center) of one or more components of the hardware layer (such as multiple CPUs, memory modules, GPUs, network interfaces, communication components, etc.). These physical resources may be arranged in racks, such as CPU racks, GPU racks, memory racks, etc., where the physical resources in the racks may be accessed using a top-of-rack switch that facilitates the structure for assembling and accessing components for the instance (e.g., when building virtualized components of the instance). Typically, the hypervisor may present multiple guest operating systems configured with multiple instances of virtual resources. That is, each operating system may be configured with a corresponding set of virtualized resources supported by one or more hardware resources (e.g., located at a corresponding data center). For example, each operating system may be supported by a virtual CPU, multiple virtual GPUs, virtual memory, virtualized communication component, and so forth. Furthermore, the configuration of one instance may be transferred from one data center to another data center to reduce latency. GPU utilization defined for a user or game may be used when saving a user's game session. GPU utilization may include any number of configurations described herein to optimize the fast rendering of video frames of a gaming session. In one embodiment, GPU utilization defined for a game or user may be transferred between data centers as a configurable setting. The ability to transmit GPU utilization settings may effectively migrate game play from data center to data center where the user connects to different geographic locations for game play.

According to one embodiment of the present disclosure, the system 100 provides a game via a cloud gaming network 190, wherein the game is executed by a client device 110 (e.g., a thin client) that is remote from the corresponding user that is playing the game. The system 100 may provide game control to one or more users playing one or more games over the cloud gaming network 190 in single-player or multi-player mode via the network 150. In some embodiments, cloud gaming network 190 may include a plurality of Virtual Machines (VMs) running on a hypervisor of a host, wherein one or more of the virtual machines are configured to execute a game processor module utilizing hardware resources available to the hypervisor of the host. Network 150 may include one or more communication technologies. In some embodiments, the network 150 may include fifth generation (5G) network technology with advanced wireless communication systems.

In some implementations, wireless technology may be used to facilitate communications. Such techniques may include, for example, 5G wireless communication techniques. 5G is a fifth generation cellular network technology. A 5G network is a digital cellular network in which the service area covered by a provider is divided into small geographical areas called cells. Analog signals representing sound and images are digitized in the telephone, converted by analog-to-digital converters and transmitted as a bit stream. All 5G wireless devices in a cell communicate over radio waves with a local antenna array and low power automatic transceivers (transmitters and receivers) in the cell through frequency channels allocated by the transceivers from a pool of frequencies reused in other cells. The local antenna is connected to the telephone network and the internet through a high bandwidth optical fiber or wireless backhaul connection. As in other cell networks, a mobile device that crosses from one cell to another will automatically go to the new cell. It should be understood that 5G networks are merely exemplary types of communication networks, and that embodiments of the present disclosure may utilize earlier generation wireless or wired communications, as well as newer generation wired or wireless technologies after 5G.

As shown, cloud gaming network 190 includes a game server 160 that provides access to a plurality of video games. The game server 160 may be any type of server computing device available in the cloud and may be configured as one or more virtual machines executing on one or more hosts. For example, the game server 160 may manage virtual machines that support game processors that instantiate game instances for users. In this manner, the plurality of game processors of the game server 160 associated with the plurality of virtual machines are configured to execute a plurality of instances of one or more games associated with game play by the plurality of users. In this manner, the backend server supports streaming of media (e.g., video, audio, etc.) that provides game play of multiple game applications to multiple corresponding users. That is, the game server 160 is configured to stream data (e.g., rendered images and/or frames of corresponding game play) back to the corresponding client device 110 over the network 150. In this manner, a computationally complex gaming application may be executed at the backend server in response to controller inputs received and forwarded by the client device 110. Each server is capable of rendering images and/or frames, which are then encoded (e.g., compressed) and streamed to the corresponding client device for display.

For example, multiple users may access cloud gaming network 190 via communication network 150 using corresponding client devices 110 configured to receive streaming media. In one embodiment, the client device 110 may be configured as a thin client that provides an interface with a backend server (e.g., cloud gaming network 190) configured to provide computing functionality (e.g., including the game name processing engine 111). In another embodiment, the client device 110 may be configured with a game name processing engine and game logic for at least some local processing of the video game, and may also be utilized to receive streaming content generated by the video game executing at the back-end server, or other content supported by the back-end server. For local processing, the game name processing engine includes basic processor-based functionality for executing the video game and services associated with the video game. In this case, the game logic may be stored on the local client device 110 and used to execute the video game.

Each client device 110 may request access to a different game from the cloud gaming network. For example, cloud gaming network 190 may execute one or more game logics that build on game name processing engine 111 as executed using CPU resource 163 and GPU resource 365 of game server 160. For example, the game logic 115a that cooperates with the game name processing engine 111 may execute on the game server 160 for one client, the game logic 115b that cooperates with the game name processing engine 111 may execute on the game server 160 for a second client, … …, and the game logic 115N that cooperates with the game name processing engine 111 may execute on the game server 160 for an nth client.

In particular, the client device 110 of a corresponding user (not shown) is configured for requesting access to the game through a communication network 150, such as the Internet, and for rendering display images (e.g., image frames) generated by a video game executed by the game server 160, wherein the encoded images are transmitted to the client device 110 for display in association with the corresponding user. For example, a user may interact with an instance of a video game executing on a game processor of the game server 160 through the client device 110. More specifically, an instance of the video game is executed by the game name processing engine 111. Corresponding game logic (e.g., executable code) 115 implementing the video game is stored and accessible through a data store (not shown) and is used to execute the video game. The game name processing engine 111 can support multiple video games using multiple game logics (e.g., game applications), each of which is selectable by a user.

For example, the client device 110 is configured to interact with a game name processing engine 111 associated with a game play of a corresponding user, such as through input commands for driving the game play. In particular, client device 110 may receive input devices from various types Inputs of settings such as game controllers, tablet computers, keyboards, gestures captured by cameras, mice, touch pads, and the like. Client device 110 may be any type of computing device having at least memory and a processor module capable of connecting to game server 160 over network 150. The backend game name processing engine 111 is configured to generate a rendered image that is transmitted over the network 150 for display at a corresponding display associated with the client device 110. For example, through a cloud-based service, game rendering images may be transmitted by instances of corresponding games (e.g., game logic) executing on game execution engine 111 of game server 160. That is, the client device 110 is configured to receive encoded images (e.g., encoded from game rendering images generated by executing a video game), and to display the rendered images on the display 11. In one embodiment, the display 11 includes an HMD (e.g., displaying VR content). In some implementations, the rendered image may be directly from a cloud-based service or via client device 110 (e.g.Remote Play) to a smart phone or tablet.

In one embodiment, game server 160 and/or game name processing engine 111 includes basic processor-based functionality for executing games and services associated with game applications. For example, game server 160 includes a Central Processing Unit (CPU) resource 163 and a Graphics Processing Unit (GPU) resource 365 configured to perform processor-based functions, including 2D or 3D rendering, physical simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterizing, ray tracing, shading, culling, transforming, artificial intelligence, and the like. In addition, the CPU and GPU groups may implement services for gaming applications, including, in part, memory management, multi-threaded management, quality of service (QoS), bandwidth testing, social networking, social friend management, social networking communications with friends, communication channels, texting, instant messaging, chat support, and the like. In one embodiment, one or more applications share particular GPU resources. In one embodiment, multiple GPU devices may be combined to perform graphics processing for a single application executing on a corresponding CPU.

In one embodiment, cloud gaming network 190 is a distributed game server system and/or architecture. In particular, the distributed game engine executing the game logic is configured to correspond to a corresponding instance of the game. Typically, a distributed game engine takes each function of the game engine and distributes the functions for execution by multiple processing entities. Individual functions may be further distributed over one or more processing entities. The processing entity may be configured in different configurations, including physical hardware, and/or as a virtual component or virtual machine, and/or as a virtual container, where the container is different from the virtual machine in that it virtualizes instances of the gaming application running on the virtualized operating system. The processing entity may utilize and/or rely on servers and their underlying hardware on one or more servers (computing nodes) of the cloud gaming network 190, where the servers may be located on one or more racks. Coordination, allocation and management of the execution of these functions for the various processing entities is performed by the distributed synchronization layer. In this manner, execution of these functions is controlled by the distributed synchronization layer to generate media (e.g., video frames, audio, etc.) for the gaming application in response to the player's controller input. The distributed synchronization layer is able to efficiently perform (e.g., by load balancing) these functions across distributed processing entities such that critical game engine components/functions are distributed and reassembled for more efficient processing.

Fig. 2 is a schematic diagram of an exemplary multiple GPU architecture 200 in which multiple GPUs cooperate to render a single image of a corresponding application, according to one embodiment of the present disclosure. It should be appreciated that in various embodiments of the present disclosure, many architectures are possible in which multiple GPUs cooperate to render a single image, although not explicitly described or shown. For example, multi-GPU rendering of geometry for an application by performing region testing at the time of rendering may be implemented between one or more cloud game servers of a cloud game system, or may be implemented within a stand-alone system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs), and so forth.

The multiple GPU architecture 200 includes a CPU 163 and multiple GPUs configured for multiple GPU rendering for a single image of an application and/or each image in a sequence of images of an application. In particular, the CPU 163 and GPU resources 365 are configured to perform processor-based functions including 2D or 3D rendering, physical simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterizing, ray tracing, shading, culling, transformation, artificial intelligence, etc., as previously described.

For example, four GPUs are shown in GPU resource 365 of multi-GPU architecture 200, although any number of GPUs may be used in rendering images for an application. Each GPU is connected to a corresponding dedicated memory, such as Random Access Memory (RAM), via a high speed bus 220. In particular, GPU-A is connected to memory 210A (e.g., RAM) via bus 220, GPU-B is connected to memory 210B (e.g., RAM) via bus 220, GPU-C is connected to memory 210C (e.g., RAM) via bus 220, and GPU-D is connected to memory 210D (e.g., RAM) via bus 220.

Furthermore, each GPU is connected to each other via a bus 240, the bus 240 may be approximately equal in speed or slower than the bus 220 for communication between the corresponding GPU and its corresponding memory, depending on the architecture. For example, GPU-A is connected to each of GPU-B, GPU-C and GPU-D via bus 240. Likewise, GPU-B is connected to each of GPU-A, GPU-C and GPU-D via bus 240. In addition, GPU-C is connected to each of GPU-A, GPU-B and GPU-D via bus 240. Further, GPU-D is connected to each of GPU-A, GPU-B and GPU-C via bus 240.

The CPU 163 is connected to each GPU via a lower speed bus 230 (e.g., bus 230 is slower than bus 220 for communication between the corresponding GPU and its corresponding memory). Specifically, CPU 163 is connected to each of GPU-A, GPU-B, GPU-C and GPU-D.

Fig. 3 is a schematic diagram of a graphics processing unit resource 365 configured for multiple GPU rendering of geometrically generated image frames of an application by pre-testing for possibly interlaced screen regions prior to rendering, according to one embodiment of the present disclosure. For example, game server 160 may be configured to include GPU resources 365 in cloud gaming network 190 of fig. 1. As shown, GPU resources 365 include multiple GPUs, such as GPU 365a, GPU 365b … … GPU 365n. As previously described, various architectures may include multiple GPUs that cooperate to render a single image by performing multiple GPU rendering of geometry for an application through region testing at the time of rendering, such as multiple GPU rendering of geometry implemented between one or more cloud game servers of a cloud game system, or multiple GPU rendering of geometry implemented within a stand-alone system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs), and so forth.

In particular, in one embodiment, game server 160 is configured to perform multiple GPU processing when rendering a single image of an application, such that multiple GPUs cooperate to render a single image, and/or to render each of one or more images of a sequence of images when executing an application. For example, in one embodiment, game server 160 may include a CPU and GPU group configured to perform multiple GPU rendering on each of one or more images in an image sequence of an application, where one CPU and GPU group may implement a graphics and/or rendering pipeline of the application. The CPU and GPU groups may be configured as one or more processing devices. As previously described, GPUs and GPU groups may include CPUs 163 and GPU resources 365 configured to perform processor-based functions, including 2D or 3D rendering, physical simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterizing, ray tracing, shading, culling, transformation, artificial intelligence, and the like.

GPU resource 365 is responsible and/or configured for rendering objects (e.g., writing color or normal vector values of object pixels to a multi-render target-MRT) and executing a synchronous computing kernel (e.g., full screen effect on the resulting MRT); the synchronization computation to be performed and the object to be rendered are specified by commands contained in the rendering command buffer 325 that the GPU is to perform. In particular, GPU resource 365 is configured to render objects and perform synchronous computations (e.g., during execution of synchronous computation cores) when executing commands from rendering command buffer 325, where commands and/or operations may depend on other operations such that they execute in sequence.

For example, GPU resource 365 is configured to perform synchronization computations and/or rendering of objects using one or more rendering command buffers 325 (e.g., rendering command buffer 325a, rendering buffer 325b … … rendering command buffer 325 n). In one embodiment, each GPU in GPU resources 365 may have their own command buffers. Alternatively, when each GPU is rendering substantially the same set of objects (e.g., due to the small size of the region), the GPUs in GPU resource 365 may use the same command buffer or the same set of command buffers. Further, each GPU in GPU resources 365 may support the ability for commands to be executed by one GPU instead of another GPU. For example, a flag on a drawing command or a prediction in a rendering command buffer allows a single GPU to execute one or more commands in the corresponding command buffer, while other GPUs will ignore the commands. For example, rendering command buffer 325a may support flag 330a, rendering command buffer 325b may support flag 330b … …, and rendering command buffer 325n may support flag 330n.

The performance of the synchronous computation (e.g., execution of the synchronous computation core) and the rendering of the object are part of the overall rendering. For example, if a video game is run at 60Hz (e.g., 60 frames per second), execution of all object rendering and synchronization computation kernels for the image frames typically must be completed within about 16.67 milliseconds (e.g., one frame at 60 Hz). As previously described, the operations performed when rendering objects and/or executing the synchronous compute kernel are ordered such that the operations may depend on other operations (e.g., a command in a rendering command buffer may need to complete execution before other commands in the rendering command buffer may execute).

In particular, each rendering command buffer 325 contains various types of commands, including commands that affect the corresponding GPU configuration (e.g., commands that specify the location and format of the rendering target), as well as commands that render objects and/or execute the synchronized compute kernel. For purposes of illustration, the synchronization computation performed when executing the synchronization computation kernel may include performing a full screen effect when objects are all rendered to one or more corresponding multi-render targets (MRTs).

Further, when the GPU resources 365 render objects for image frames and/or perform a synchronous compute kernel when generating image frames, the GPU resources 365 are configured via registers of each GPU 365a, 365b … … 365 n. For example, GPU 365a is configured via its registers 340 (e.g., register 340a, register 340b … … register 340 n) to perform the rendering or compute kernel execution in some manner. That is, when executing commands in the render command buffer 325 for rendering objects for image frames and/or executing a synchronous compute kernel, the values stored in the registers 340 define the hardware context (e.g., GPU configuration or GPU state) for the GPU 365 a. Each GPU in GPU resources 365 may be similarly configured such that GPU 365b is configured via its registers 350 (e.g., register 350a, register 350b … … register 350 n) to perform the rendering or compute kernel execution in some manner; … … and GPU 365n via its registers 370 (e.g., register 370a, register 370b … … register 370 n) is configured to perform the rendering or compute kernel execution in some manner.

Some examples of GPU configurations include the location and format of a render target (e.g., MRT). Further, other examples of GPU configurations include operating procedures. For example, in rendering an object, the Z value of each pixel of the object may be compared to the Z buffer in various ways. For example, an object pixel is written only when the object Z value matches a value in the Z-buffer. Alternatively, the object pixel may be written only when the object Z value is equal to or less than the value in the Z-buffer. The type of test being performed is defined in the GPU configuration.

Fig. 4 is a simplified schematic diagram of a rendering architecture of a graphics pipeline 400 configured for multiple GPU processing such that multiple GPUs cooperate to render a single image, according to an implementation of the present disclosure. Graphics pipeline 400 illustrates a general process for rendering images using a 3D (three-dimensional) polygon rendering process. Graphics pipeline 400 for rendering images outputs corresponding color information for each pixel in a display, where the color information may represent texture and shading (e.g., color, shading, etc.). Graphics pipeline 400 may be implemented within client device 110, game server 160, game name processing engine 111, and/or GPU resources 365 of fig. 1 and 3. That is, various architectures may include multiple GPUs that cooperate to render a single image by performing multiple GPU rendering of geometry for an application through region testing at the time of rendering, such as multiple GPU rendering that implements geometry between one or more cloud game servers of a cloud game system, or multiple GPU rendering that implements geometry within a stand-alone system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs), and so forth.

As shown, the graphics pipeline receives an input geometry 405. For example, geometry processing stage 410 receives input geometry 405. For example, the input geometry 405 may include vertices within the 3D game world, as well as information corresponding to each vertex. A given object in the game world may be represented using polygons (e.g., triangles) defined by vertices, where the surfaces of the corresponding polygons are then processed through graphics pipeline 400 to achieve a final effect (e.g., color, texture, etc.). Vertex attributes may include normals (e.g., geometry of which direction is perpendicular to that location), colors (e.g., RGB-red, green, and blue triples, etc.), and texture coordinate/mapping information.

Geometry processing stage 410 is responsible for (and capable of) both vertex processing (e.g., via a vertex shader) and primitive processing. In particular, geometry processing stage 410 may output a set of vertices defining primitives and pass them to the next stage of graphics pipeline 400, as well as the locations (precisely, homogeneous coordinates) and various other parameters of the vertices. These locations are placed in location cache 450 for later shader level access. Other parameters are placed in parameter cache 460 for access again by the following shader stages.

The geometry processing stage 410 may perform various operations, such as performing illumination and shadow calculations on primitives and/or polygons. In one embodiment, since the geometry stage is capable of handling primitives, it may perform backface culling and/or clipping (e.g., testing for view cones), thereby reducing the load of downstream stages (e.g., rasterization stage 420, etc.). In another embodiment, the geometry stage may generate primitives (e.g., with equivalent functionality to a conventional geometry shader).

The primitives output by the geometry processing stage 410 are fed to a rasterization stage 420, which rasterizes the primitives into a raster image comprised of pixels. In particular, the rasterization stage 420 is configured to project objects in a scene to a two-dimensional (2D) image plane defined by viewing locations (e.g., camera locations, user eye locations, etc.) in the 3D game world. At a simple level, the rasterization stage 420 looks at each primitive and determines which pixels are affected by the corresponding primitive. In particular, rasterizer 420 segments the primitive into pixel-sized fragments, wherein each fragment corresponds to a pixel in the display. It is important to note that one or more segments may affect the color of the corresponding pixel when displaying an image.

As previously described, the rasterization stage 420 may also perform additional operations such as clipping for viewing locations (identifying and ignoring segments outside the view cone) and culling (ignoring segments that are occluded by more recent objects). With respect to clipping, the geometry processing stage 410 and/or the rasterization stage 420 may be configured to identify and ignore primitives outside of the view cone defined by the viewing position in the game world.

The pixel processing stage 430 uses parameters created by the geometry processing stage, as well as other data, to generate values such as the resulting color of the pixel. In particular, the processing stage 430 performs a shading operation on the fragment at its core pixels to determine how the color and brightness of the primitives change with the available illumination. For example, pixel processing stage 430 may determine the depth, color, normal, and texture coordinates (e.g., texture details) of each segment, and may further determine the appropriate brightness, darkness, and color level for the segment. In particular, pixel processing stage 430 calculates features for each segment, including color and other attributes (e.g., z-depth represents distance from viewing position, and alpha value represents transparency). Furthermore, the pixel processing stage 430 applies a lighting effect to the segments based on the available lighting affecting the corresponding segments. In addition, pixel processing stage 430 may apply a shadow effect to each segment.

The output of the pixel processing stage 430 includes processed fragments (e.g., texture and shading information) and is passed to an output merge stage 440 in the next stage of the graphics pipeline 400. The output merge stage 440 uses the output of the pixel processing stage 430 and other data such as values already in memory to generate a final color for the pixel. For example, the output merge stage 440 may optionally blend the values between the segments and/or pixels determined from the pixel processing stage 430 with the value of the MRT that has been written to that pixel.

The color value of each pixel in the display may be stored in a frame buffer (not shown). When a corresponding image of the scene is displayed, these values are scanned to the corresponding pixels. In particular, the display reads color values from the frame buffer of each pixel, row by row, from left to right or right to left, top to bottom or bottom to top, or in any other pattern, and uses these pixel values to illuminate the pixels when displaying an image.

With detailed descriptions of cloud gaming network 190 (e.g., in game server 160) and GPU resources 365 of fig. 1-3, flowchart 500 of fig. 5 illustrates a graphics processing method when implementing multiple GPU rendering of geometry for application-generated image frames by pre-testing the geometry for interlaced screen regions prior to rendering, according to one embodiment of the present disclosure. In this way, multiple GPU resources are used to efficiently perform rendering of objects when executing an application. As previously described, various architectures may include multiple GPUs that cooperate to render a single image by performing multiple GPU rendering of geometry for an application through region testing at the time of rendering, such as within one or more cloud game servers of a cloud game system, or within a stand-alone system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs), and so forth.

At 510, the method includes rendering graphics for an application using a plurality of Graphics Processing Units (GPUs) that cooperatively generate an image. In particular, the multiple GPU processing is performed while rendering each of a single image frame and/or one or more image frames in a sequence of image frames for real-time applications.

At 520, the method includes dividing responsibility for rendering geometry of the graphics among the plurality of GPUs based on the plurality of screen regions. That is, each GPU has a corresponding division of responsibility (e.g., a corresponding screen region) known to all GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen regions of the plurality of screen regions, wherein the corresponding set of screen regions includes one or more screen regions. For example, a first GPU has a first division of responsibility for rendering objects in a first set of screen regions. Likewise, the second GPU has a second division of responsibility for rendering objects in a second set of screen areas. This is repeatable for the remaining GPUs.

At 530, the method includes assigning a first geometry of an image frame generated during execution of the application for geometry testing to the first GPU. For example, an image frame may include one or more objects, where each object may be defined by one or more geometric figures. That is, in one embodiment, geometry pre-testing and rendering is performed on one geometry as an entire object. In other embodiments, geometry pre-testing and rendering is performed on one geometry that is part of the entire object.

For example, each of the plurality of GPUs is assigned to a corresponding portion of the geometry associated with the image frame. In particular, each portion of the geometry is assigned to a corresponding GPU for geometry pre-testing. In one embodiment, the geometry may be distributed evenly among the multiple GPUs. For example, if there are four GPUs in the plurality of GPUs, each GPU may process a quarter of the geometry in the image frame. In other embodiments, the geometry may be unevenly distributed among the multiple GPUs. For example, in an example of multiple GPU rendering of an image frame using four GPUs, one GPU may process more geometry of the image frame than another GPU.

At 540, the method includes performing a geometry pre-test at the first GPU to generate information regarding how the geometry relates to the plurality of screen regions. In particular, the first GPU generates the geometry and information about how it relates to each of the plurality of screen areas. For example, a geometry pre-test by a first GPU may determine whether the geometry overlaps a particular screen region allocated to the corresponding GPU for object rendering. The first geometry may overlap with screen areas of other GPUs responsible for object rendering and/or may overlap with screen areas of the first GPU responsible for object rendering. In one implementation, before the rendering of the geometry is performed by any of the plurality of GPUs, the shader in the corresponding command buffer executed by the first GPU performs the geometry test. In other embodiments, the geometry test is performed by hardware, such as in the rasterization stage 420 of the graphics pipeline 400.

Geometry pre-testing is typically performed in an implementation by multiple GPUs simultaneously for all geometries of a corresponding image frame. That is, each GPU performs geometry pre-testing for its portion of the geometry of the corresponding image frame. In this way, geometry pre-testing of the GPUs allows each GPU to know which geometries to render, and which geometries to skip. In particular, when the corresponding GPU performs a geometry pre-test, it tests its geometry portion for the screen area of each of the plurality of GPUs used to render the image frame. For example, if there are four GPUs, each GPU may perform geometry testing on a quarter of the image frame geometry, particularly if the geometry is evenly distributed to the GPUs for geometry testing. Thus, while each GPU performs geometry pre-testing for only its geometry portion of the corresponding image frame, because geometry pre-testing is typically performed in an embodiment simultaneously for all geometries of the image frame across multiple GPUs, the generated information indicates how all geometries (e.g., geometries) in the image frame relate to screen regions of all GPUs, where the screen regions are each assigned to a corresponding GPU for object rendering, and/or where rendering may be performed on the geometries (e.g., the entire object or a portion of the object).

At 550, the method includes using information at each of the plurality of GPUs in rendering the geometry (e.g., including fully rendering the geometry or skipping the rendering of the geometry). That is, the information is used at each of the plurality of GPUs to render the geometry, where test results (e.g., information) for the geometry are sent to the other GPUs such that the information is known to each GPU. For example, geometry (e.g., multiple geometries) in an image frame is typically rendered simultaneously by multiple GPUs in an implementation. In particular, when a geometry overlaps any screen area allocated to a corresponding GPU for object rendering, the GPU will render the geometry based on that information. On the other hand, when the geometry does not overlap any screen area allocated to the corresponding GPU for object rendering, the GPU may skip the rendering of the geometry based on the information. This information, therefore, allows all GPUs to render geometry in an image frame more efficiently and/or avoid rendering the geometry altogether. For example, rendering may be performed by shaders in corresponding command buffers that are executed by multiple GPUs. As will be described more fully below in fig. 7A, 12A, and 13A, the shader may be configured to perform one or both of geometry testing and/or rendering based on the corresponding GPU configuration.

In accordance with one embodiment of the present disclosure, in some architectures, if a corresponding rendering GPU receives corresponding information in time to use it, that GPU will use that information in deciding which geometry within the corresponding image to render. That is, the information may be used as a hint. Otherwise, the rendering GPU will process the geometry as usual. Using an example in which the information may indicate whether the geometry overlaps any screen region allocated to the rendering GPU (e.g., the second GPU), the rendering GPU may skip rendering the geometry entirely if the information indicates that the geometry does not overlap. Further, if only a plurality of the geometries do not overlap, the second GPU may skip rendering of at least those of the geometries that do not overlap any screen regions allocated to the second GPU for object rendering. On the other hand, the information may indicate that there is an overlap of the geometry, in which case the second GPU or rendering GPU will render the geometry. Also, this information may indicate that certain geometries overlap with any screen regions allocated to the second GPU or rendering GPU for object rendering. In this case, the second GPU or rendering GPU will render only those geometry that overlap. In yet another embodiment, if there is no information, or if no information is generated or received in time, the second GPU will perform rendering (e.g., rendering geometry) normally. Thus, information provided as hints may improve the overall efficiency of the graphics processing system if received in time. If no information is received in time, the graphics processing system still operates normally without the information.

In one embodiment, one GPU (e.g., a pretest GPU) is dedicated to performing geometry pretests to generate information. That is, the dedicated GPU is not used to render objects (e.g., geometry) in the corresponding image frame. Specifically, as previously described, multiple GPUs are used to render graphics of an application. Responsibility for rendering geometry of graphics is divided among multiple GPUs based on multiple screen regions that may be interlaced, where each GPU has a corresponding responsibility division known to the multiple GPUs. A geometry test is performed at the pre-test GPU on a plurality of geometries of an image frame generated by the application to generate information about each geometry and its relationship to each of the plurality of screen regions. The plurality of geometric figures are rendered at each of the plurality of GPUs using the information generated for each of the plurality of geometric figures. That is, this information is used when each of the geometry is rendered by a corresponding one of the GPUs used to render the image frames.

Fig. 6A-6B illustrate the rendering of a screen subdivided into regions and sub-regions purely for illustrative purposes. It will be appreciated that the number of subdivided regions and/or sub-regions may be selected to efficiently multi-GPU process each of the images and/or one or more images in the sequence of images. That is, the screen may be subdivided into two or more regions, wherein each region may be further divided into sub-regions. In one embodiment of the present disclosure, the screen is subdivided into four quadrants, as shown in fig. 6A. In another embodiment of the present disclosure, the screen is subdivided into a greater number of interlaced regions, as shown in fig. 6B. The following discussion of FIGS. 6A-6B is intended to illustrate inefficiencies that occur when performing multiple GPU rendering on multiple screen areas to which multiple GPUs are assigned; fig. 7A-7C and 8A-8B illustrate more efficient rendering according to some embodiments of the invention.

In particular, fig. 6A is a schematic diagram of a screen 610A subdivided into quadrants (e.g., four regions) when performing multiple GPU rendering. As shown, screen 610A is subdivided into four quadrants (e.g., A, B, C and D). Each quadrant is assigned to one of four GPUs [ GPU-A, GPU-B, GPU-C and GPU-D ] in a one-to-one relationship. For example, GPU-A is assigned to quadrant A, GPU-B is assigned to quadrant B, GPU-C is assigned to quadrant C, and GPU-D is assigned to quadrant D.

The geometry may be culled. For example, the CPU 163 may examine the bounding box of the frustum of each quadrant and request that each GPU render only objects that overlap with its corresponding frustum. The result is that each GPU is responsible for rendering only a portion of the geometry. For illustration purposes, screen 610 shows a geometry, each of which is a corresponding object, where screen 610 shows objects 611-617 (e.g., geometries). GPU-A will not render any objects because no objects overlap quadrant A. GPU-B will render objects 615 and 616 (since a portion of object 615 is present in quadrant B, the culling test of the CPU will correctly conclude that GPU-B must render it). GPU-C will render objects 611 and 612.GPU-D would render objects 612, 613, 614, 615, and 617.

In fig. 6A, when screen 610A is divided into quadrants a-D, the amount of work each GPU must perform may be very different because in some cases, the disproportionate amount of geometry may be in one quadrant. For example, quadrant a does not have any geometry, while quadrant D has five geometries, or at least a portion of at least five geometries. Thus, GPU-A assigned to quadrant A will be in an idle state, while GPU-D assigned to quadrant D will be abnormally busy in rendering objects in the corresponding image.

Fig. 6B illustrates another technique when subdividing a screen into regions. In particular, screen 610B is not subdivided into quadrants, but into multiple interleaved regions when performing multiple GPU rendering on each of one or more images in a single image or sequence of images. In that case, screen 610B is subdivided into a greater number of interleaved regions (e.g., greater than four quadrants) while rendering using the same amount of GPU (e.g., four). The objects (611-617) shown in screen 610A are also displayed in the same corresponding locations in screen 610B.

In particular, four GPUs (e.g., GPU-A, GPU-B, GPU-C and GPU-D) are used to render images for corresponding applications. Each GPU is responsible for rendering geometry that overlaps with a corresponding region. That is, each GPU is assigned to a corresponding set of regions. For example, GPU-A is responsible for each region labeled A in the corresponding group, GPU-B is responsible for each region labeled B in the corresponding group, GPU-C is responsible for each region labeled C in the corresponding group, and GPU-D is responsible for each region labeled D in the corresponding group.

Furthermore, these regions are staggered in a particular pattern. Because of the interleaving of regions (and the greater number of regions), the amount of work that each GPU must perform may be more balanced. For example, the interlaced pattern of screen 610B includes alternating rows including regions A-B-A-B, etc., and regions C-D-C-D, etc. Other modes of interleaving regions are supported in embodiments of the present disclosure. For example, the pattern may include a repeating sequence of regions, uniformly distributed regions, unevenly distributed regions, rows of repeatable regions, random sequences of regions, rows of random sequences of regions, and so forth.

The number of selection areas is important. For example, if the distribution of regions is too fine (e.g., the number of regions is too large to optimize), each GPU still has to process most or all of the geometry. For example, it may be difficult to examine the object bounding box for all the regions for which the GPU is responsible. Furthermore, even if the bounding box can be checked in time, the result will be that each GPU may have to process a large portion of the geometry because each object in the image overlaps at least one region of each GPU (e.g., the GPU processes the entire object even though only a portion of the object overlaps at least one region of a set of regions allocated to the GPU).

Therefore, it is important to select the number of areas, the interleaving pattern, and the like. Selecting too few or too many regions, or too few or too many regions for interleaving, or selecting an inefficient mode for interleaving may result in inefficiency in performing GPU processing (e.g., most or all of each GPU processing geometry). In these cases, although there are multiple GPUs for rendering the image, the corresponding increased screen pixel count and geometry density cannot be supported due to GPU inefficiency (i.e., four GPUs cannot write four times the pixels and process four times the vertices or primitives). The following embodiments are directed to improvements in culling strategies (fig. 7A-7C) and culling granularity (fig. 8A-8B), as well as other advances.

Fig. 7A-7C are schematic diagrams illustrating the use of multiple GPUs to render each of at least one or more images in a single image and/or a sequence of images in embodiments of the present disclosure. The four GPUs are chosen purely to facilitate illustration of multi-GPU rendering when rendering images while executing an application, and it is understood that any number of GPUs may be used for multi-GPU rendering in various embodiments.

In particular, fig. 7A is a schematic diagram of a rendering command buffer 700A shared by multiple GPUs that cooperate to render a single image frame, according to one embodiment of the present disclosure. That is, in this example, multiple GPUs each use the same rendering command buffer (e.g., buffer 700A), and each GPU executes all commands in the rendering command buffer. A plurality of commands (complete set) are loaded into the rendering command buffer 700A and used to render the corresponding image frames. It is to be appreciated that one or more rendering command buffers can be utilized to generate corresponding image frames. In one example, the CPU generates one or more draw calls for the image frame, wherein the draw call includes a command placed in one or more render command buffers for one or more GPUs of GPU resource 365 of fig. 3 to execute when performing multi-GPU rendering of the corresponding image. In some implementations, the CPU 163 may request one or more GPUs to generate all or some of the draw calls for rendering the corresponding images. Further, the entire command set contained in rendering command buffer 700A may be shown in FIG. 7A, or FIG. 7A may show a portion of the entire command set contained within rendering command buffer 700A.

When performing multiple GPU rendering of each of one or more images in an image or sequence of images, GPUs typically render simultaneously in an embodiment. The rendering of the image may be divided into a plurality of phases. In each stage, the GPUs need to be synchronized, so the faster GPU must wait until the slower GPU completes. The commands for rendering command buffer 700A shown in FIG. 7A illustrate one phase. Although commands for only one stage are shown in fig. 7A, the rendering command buffer 700A may include commands for one or more stages in rendering an image, and fig. 7A shows only a portion of all commands, and thus commands for other stages are not shown. In the rendering command buffer 700A illustrated in fig. 7A, which illustrates one stage, there are four objects (e.g., object 0, object 1, object 2, and object 3) to be rendered, as illustrated in fig. 7B-1.

As shown, the rendering command buffer 700A shown in fig. 7A includes commands for geometry testing, object (e.g., geometry) rendering, and commands for configuring the state of one or more rendering GPUs that are executing the commands from the rendering command buffer 700A. For illustration purposes only, the render command buffer 700A shown in FIG. 7A includes commands for geometry pre-testing, rendering objects, and/or executing a synchronous compute kernel when rendering corresponding images for corresponding applications (710-728). In some embodiments, the execution of the geometry pre-test and the object rendering of the image and/or the synchronous computing kernel must be performed within one frame period. Two processing segments are shown in rendering command buffer 700A. In particular, processing segment 1 includes pre-test or geometry test 701, and segment 2 includes rendering 702.

Segment 1 includes performing a geometry test 701 on objects in an image frame, where each object may be defined by one or more geometries. The pre-test or geometry test 701 may be performed by one or more shaders. For example, in one embodiment, a portion of the geometry of an image frame is assigned to each GPU used in a multi-GPU rendering of the corresponding image frame to perform a geometry test, where each portion may be assigned for pre-testing. The assigned portion may include one or more geometric figures, where each geometric figure may include the entire object, or may include a portion of the object (e.g., vertices, primitives, etc.). In particular, a geometry test is performed on a geometry to generate information about how the geometry relates to each of a plurality of screen regions. For example, the geometry test may determine whether a geometry overlaps a particular screen region allocated to the corresponding GPU for object rendering.

As shown in fig. 7A, geometry test 701 (e.g., a pre-test of geometry) for segment 1 includes commands for configuring the state of one or more GPUs executing commands from rendering command buffer 700A, and commands for executing geometry tests. In particular, the GPU state of each GPU is configured prior to the GPU performing geometry testing on the corresponding object. For example, commands 710, 713, and 715 are each used to configure GPU states of one or more GPUs for executing commands for geometry testing. As shown, command 710 configures the GPU state so that geometry test commands 711-712 can be properly executed, where command 711 performs geometry testing on object 0 and command 712 performs geometry testing on object 1. Similarly, command 713 configures the GPU state so that geometry test command 714 can perform geometry testing on object 2. Likewise, command 715 configures the GPU state so that geometry test command 716 may perform geometry testing on object 3. It is to be appreciated that GPU states can be configured for one or more geometry test commands (e.g., test commands 711 and 712).

As previously described, when executing commands in rendering command buffer 700A for geometry testing and/or rendering objects for a corresponding image and/or executing a synchronous compute kernel, the values stored in the registers define the hardware context (e.g., GPU configuration) for the corresponding GPU. As shown, the GPU state may be modified throughout the processing of commands in the render command buffer 700A, and each subsequent segment of commands may be used to configure the GPU state. As applied to fig. 7A, and when setting the GPU state is mentioned throughout the specification, the GPU state may be set in a variety of ways. For example, the CPU or GPU may set a value in Random Access Memory (RAM) where the GPU will check the value in RAM. In another example, the state may be internal to the GPU, such as when the command buffer is called twice as a subroutine, the internal GPU state being different between the two subroutine calls.

Segment 2 includes performing rendering 702 of objects in an image frame, wherein geometry is rendered). Rendering 702 may be performed by one or more shaders in command buffer 700A. As shown in fig. 7A, the rendering 702 of segment 2 includes commands for configuring the state of one or more GPUs executing commands from rendering command buffer 700A, and commands for executing rendering. In particular, the GPU state of each GPU is configured before the GPU renders the corresponding object (e.g., geometry). For example, commands 721, 723, 725, and 727 are each used to configure GPU state of one or more GPUs for executing commands for rendering. As shown, command 721 configures the GPU state so that render command 722 may render object 0; command 723 configures the GPU state so that render command 724 may render object 1; command 725 configures the GPU state so that rendering command 726 can render object 2; and command 727 configures the GPU state so that render command 728 can render object 3. While fig. 7A illustrates configuring the GPU state for each rendering command (e.g., rendering object 0, etc.), it should be understood that the GPU state may be configured for one or more rendering commands.

As previously described, each GPU used in the multi-GPU rendering of the corresponding image frame renders the corresponding geometry based on the information generated during the geometry pre-test. Specifically, the information known to each GPU provides a relationship between the object and the screen area. The GPU may use this information to efficiently render the corresponding geometry if it is received in time when rendering the geometry. Specifically, as indicated by the information, a GPU performs rendering of a geometry when the geometry overlaps any one or more screen regions allocated to the corresponding GPU for object rendering. On the other hand, this information may indicate that the first GPU should skip rendering the geometry entirely (e.g., the geometry does not overlap any screen area the first GPU is assigned to be responsible for object rendering). In this way, each GPU only renders geometry that overlaps with the screen region or regions it is responsible for object rendering. In this way, this information is provided as a hint to each GPU so that if it is received before rendering begins, it is considered by each GPU that is executing the rendering geometry. In one embodiment, if no information is received in time, the rendering proceeds normally, such as the corresponding geometry being fully rendered by the corresponding GPU, regardless of whether the geometry overlaps any screen area allocated to the GPU for object rendering.

For illustration purposes only, four GPUs divide the corresponding screen into regions between them. As previously described, each GPU is responsible for rendering objects in a corresponding set of regions, where the corresponding set includes one or more regions. In one implementation, rendering command buffer 700A is shared by multiple GPUs that cooperatively render a single image. That is, GPUs for multiple GPU rendering of each of a single image or one or more images in a sequence of images share a common command buffer. In another embodiment, each GPU may have its own command buffer.

Alternatively, in yet another implementation, each GPU may be rendering slightly different sets of objects. This may occur when it may be determined that a particular GPU does not need to render a particular object, as it does not overlap its corresponding screen area, such as in a corresponding group. Multiple GPUs may still use the same command buffer (e.g., share one command buffer) as long as the command buffer supports the ability for commands to be executed by one GPU but not another GPU, as previously described. For example, execution of commands in shared rendering command buffer 700A may be limited to one of the rendering GPUs. This can be accomplished in a number of ways. In another example, a flag may be used on the corresponding command to indicate which GPUs should execute it. Also, the prediction may be implemented using bits in the render command buffer to account for which GPU did what under what conditions. Examples of predictions include — if this is GPU-a, skip the following X commands.

In yet another embodiment, since each GPU is rendering substantially the same set of objects, multiple GPUs may still use the same command buffer. For example, when the area is relatively small, each GPU may still render all objects, as previously described.

Fig. 7B-1 illustrates a screen 700B showing an image including four objects rendered by multiple GPUs using the rendering command buffer 700A of fig. 7A, according to one embodiment of the present disclosure. According to one embodiment of the present disclosure, multiple GPU rendering of geometry is performed for an application by pre-testing the geometry for screen regions that may be interlaced prior to rendering the geometry corresponding to objects in an image frame.

In particular, the rendering responsibilities of the geometry are divided among the plurality of GPUs by screen regions, wherein the plurality of screen regions are configured to reduce rendering time imbalance among the plurality of GPUs. For example, screen 700B illustrates the screen area responsibilities of each GPU when rendering objects of an image. Four GPUs (GPU-A, GPU-B, GPU-C and GPU-D) are used to render objects in the image shown on screen 700B. Screen 700B is more finely divided by quadrants than shown in fig. 6A in an effort to balance pixel and vertex loads between GPUs. In addition, the screen 700B is divided into regions that may be interlaced. For example, the staggering includes multiple rows of regions. Each of rows 731 and 733 includes regions a alternating with regions B. Each of rows 732 and 734 includes regions C alternating with regions D. More specifically, in one mode, rows including regions a and B alternate with rows including region C and region D.

As previously described, to achieve GPU processing efficiency, various techniques may be used in dividing the screen into regions, such as increasing or decreasing the number of regions (e.g., selecting the correct number of regions), interleaving the regions, increasing or decreasing the number of regions for interleaving, selecting a particular pattern when interleaving the regions and/or sub-regions, and so forth. In one embodiment, the size of each of the plurality of screen regions is uniform. In one embodiment, each of the plurality of screen regions is non-uniform in size. In yet another embodiment, the number and size of the plurality of screen regions dynamically changes.

Each of the GPUs is responsible for rendering objects in a corresponding set of regions, where each set may include one or more regions. Thus, GPU-A is responsible for rendering objects in each A region in the corresponding group, GPU-B is responsible for rendering objects in each B region in the corresponding group, GPU-C is responsible for rendering objects in each C region in the corresponding group, and GPU-D is responsible for rendering objects in each D region in the corresponding group. There may also be GPUs with other responsibilities so that they may not perform rendering (e.g., perform asynchronous compute kernels that execute over multiple frame periods, perform culling for rendering GPUs, etc.).

The amount of rendering to be performed is different for each GPU. Fig. 7B-2 illustrates a table showing the rendering performed by each GPU when rendering the four objects of fig. 7B-1, according to one embodiment of the present disclosure. As shown in the table, through geometry pre-testing, it can be determined that object 0 is rendered by GPU-B; object 1 is rendered by GPU-C and GPU-D; object 2 is rendered by GPU-A, GPU-B and GPU-D; and object 3 is rendered by GPU-B, GPU-C and GPU-D. There may still be some unbalanced rendering because GPU a only needs to render object 2, while GPU D needs to render objects 1, 2, and 3. However, in general, with interleaving of screen regions, the rendering of objects within an image is reasonably balanced among multiple GPUs for multiple GPU rendering of the image or rendering of each of one or more images in a sequence of images.

Fig. 7C is a schematic diagram illustrating the rendering of each object performed by each GPU when multiple GPUs cooperatively render a single image frame, such as image frame 700B shown in fig. 7B-1, in accordance with one embodiment of the present disclosure. In particular, FIG. 7C illustrates a rendering process of objects 0-3 performed by each of four GPUs (e.g., GPU-A, GPU-B, GPU-C and GPU-D) using the shared rendering command buffer 700A of FIG. 7A.

In particular, two rendering timing diagrams are shown with respect to the timeline 740. Rendering timing diagram 700C-1 illustrates a multi-GPU rendering of objects 0-3 of a corresponding image in one rendering stage, where each GPU performs rendering without any information about the overlap between objects 0-3 and screen regions. Rendering timing diagram 700C-2 illustrates a multi-GPU rendering of objects 0-3 of corresponding images in the same rendering stage, where information generated during geometry testing of screen regions (e.g., performed prior to rendering) is shared with each GPU for rendering objects 0-3 through the corresponding GPU pipeline. Each of the rendering timing diagrams 700C-1 and 700C-2 shows the time it takes for each GPU to process each geometry (e.g., perform geometry testing and rendering). In one embodiment, a geometric figure is a complete object. In another embodiment, one geometry may be part of an object. For illustration purposes, the example of fig. 7C shows the rendering of geometric figures, where each geometric figure corresponds to an object (e.g., its entirety). In each of rendering timing diagrams 700C-1 and 700C-2, objects (e.g., geometric figures) that do not have geometric figures (e.g., primitives for the objects) that overlap at least one screen region (e.g., in a corresponding set of regions) of the corresponding GPU are represented by boxes drawn with dashed lines. On the other hand, objects having geometry that overlap with at least one screen region of the corresponding GPU (e.g., in a corresponding set of regions) are represented by boxes drawn with solid lines.

Rendering timing diagram 700C-1 illustrates rendering objects 0-3 using four GPUs (e.g., GPU-A, GPU-B, GPU-C and GPU-D). Vertical line 755a indicates the beginning of the rendering phase of the object, while vertical line 755b shows the end of the rendering phase of the object in rendering timing diagram 700C-1. The starting and ending points along the timeline 740 of the rendering stage shown represent synchronization points, where each of the four GPUs are synchronized in executing the corresponding GPU pipeline. For example, at vertical line 755B, which indicates the end of the rendering phase, all GPUs must wait for the slowest GPU (e.g., GPU-B) to complete rendering of objects 0-3 through the corresponding graphics pipeline before proceeding to the next rendering phase.

No geometry pre-test is performed in rendering timing diagram 700C-1. Thus, each GPU must process each object through the corresponding graphics pipeline. If there are no pixels to be drawn for the object in any of the regions allocated to the corresponding GPU for object rendering (e.g., in the corresponding group), the GPU may not be able to fully render the object through the graphics pipeline. For example, when objects do not overlap, only the geometry processing stages of the graphics pipeline are executed. However, this still requires some time to handle.

In particular, GPU-a does not fully render objects 0, 1, and 3 because they do not overlap with any screen regions allocated to GPU-a for object rendering (e.g., in the corresponding group). The rendering of these three objects is shown in a dashed box indicating that at least the geometry processing stage is executing, but the graphics pipeline is not fully executing. GPU-a completely renders object 2 because the object overlaps with at least one screen region allocated to GPU-a for rendering. The rendering of object 2 is shown in the solid line box, indicating that all stages of the corresponding graphics pipeline are executing. Similarly, GPU-B would not fully render object 1 (shown with a dashed box) (i.e., at least execute the geometric processing stage), but would fully render objects 0, 2, and 3 (shown with a solid box) because those objects overlap with at least one screen region allocated to GPU-B for rendering (e.g., in a corresponding group). Likewise, GPU-C will not fully render objects 0 and 2 (shown with dashed boxes) (i.e., at least perform the geometric processing stage), but will fully render objects (shown with solid boxes) because those objects overlap with at least one screen region (e.g., in a corresponding group) allocated to GPU-C for rendering. Further, GPU-D would not fully render object 0 (shown with a dashed box) (i.e., at least execute the geometric processing stage), but would fully render objects 1, 2, and 3 (shown with a solid box) because those objects overlap with at least one screen region (e.g., in a corresponding group) allocated to GPU-D for rendering.

Rendering timing diagram 700C-2 shows geometry pre-test 701 'and rendering 702' for objects 0-3 using multiple GPUs. Vertical line 750a indicates the beginning of a rendering phase (e.g., including geometry pre-test and rendering) of the object, while vertical line 750b shows the end of the rendering phase of the object in rendering timing diagram 700C-2. The starting and ending points along the timeline 740 of the rendering stage shown in the timing diagram 700C-2 represent synchronization points, where each of the four GPUs are synchronized in executing the corresponding GPU pipeline, as previously described. For example, at vertical line 750B, which indicates the end of the rendering phase, all GPUs must wait for the slowest GPU (e.g., GPU-B) to complete rendering of objects 0-3 through the corresponding graphics pipeline before proceeding to the next rendering phase.

First, geometry pre-test 701' is performed by the GPUs, wherein each GPU performs geometry pre-testing for a subset of the geometry of the image frame for all screen regions, wherein each screen region is assigned to a corresponding GPU for object rendering. As previously described, each GPU is assigned to a corresponding portion of the geometry associated with an image frame. The geometry pre-test generates information about how a particular geometry is associated with each screen region, such as whether the geometry overlaps any screen region (e.g., in a corresponding group) assigned to the corresponding GPU for object rendering. This information is shared with each GPU that is used to render the image frames. For example, geometry pre-test 701' shown in FIG. 7C includes having GPU-A perform geometry pre-test on object 0, GPU-B perform geometry pre-test on object 1, GPU-C perform geometry pre-test on object 2, and GPU-D perform geometry pre-test on object 3. The time for performing the geometric pre-test may vary depending on the object under test. For example, the geometry pre-test of object 0 takes less time than performing the geometry pre-test on object 1. This may be due to object size, number of overlapping screen areas, etc.

After geometry pre-testing, each GPU performs rendering on all objects or geometries that intersect its screen area. In one implementation, once the geometry test is complete, each GPU begins rendering its geometry. That is, there are no synchronization points between the geometry test and rendering. This is possible because the geometric test information being generated is considered a hint rather than a hard dependency. For example, GPU-A begins rendering object 2 before GPU-B completes the geometry pre-test for object 1, and thus before GPU-B begins rendering objects 0, 2, and 3.

Vertical line 750a is aligned with vertical line 755a such that each of rendering timing diagrams 700C-1 and 700C-2 simultaneously begins rendering object 0-1. However, the rendering of objects 0-3 shown in rendering timing diagram 700C-2 is performed in a shorter time than the rendering shown in rendering timing diagram 700C-1. That is, the vertical line 750b indicating the end of the rendering phase of the lower timing diagram 700C-2 occurs earlier than the end of the rendering phase of the upper timing diagram 700C-1 as indicated by the vertical line 755 b. In particular, the speed increase 745 when rendering objects 0-3 is implemented when performing multiple GPU rendering of the geometry of the image for an application, including pre-testing the geometry for screen regions prior to rendering, and providing the results of the geometry pre-test as information (e.g., hints). As shown, the speed increase 745 is the time difference between the vertical line 750b of the timing diagram 700C-2 and the vertical line 755b of the timing diagram 700C-1.

The speed increase is achieved by the generation and sharing of information generated during geometric pre-testing. For example, during geometry pre-test, GPU-A generates information indicating that object 0 only needs to be rendered by GPU-B. Thus, GPU-B is informed that it should render object 0, while other GPUs (e.g., GPU-A, GPU-C and GPU-D) may skip the rendering of object 0 entirely, because object 0 does not overlap with any region allocated to these GPUs for object rendering (e.g., in the corresponding group). For example, the GPUs do not need to execute a geometry processing stage, which would be processed without geometry pre-testing, even though the GPUs would not fully render object 0, as shown in timing diagram 700C-1. Also, during geometry pre-testing, GPU-B generates information indicating that object 1 should be rendered by GPU-C and GPU-D and that GPU-a and GPU-B may skip rendering of object 1 entirely, because object 1 does not overlap with any region allocated to GPU-a or GPU-B for object rendering (e.g., in the respective corresponding groups). Also, during geometry pre-testing, GPU-C generates information indicating that object 2 should be rendered by GPU-A, GPU-B and GPU-D and that GPU-C may skip rendering of object 2 entirely, because object 2 does not overlap with any region allocated to GPU-C for object rendering (e.g., in the corresponding group). Further, during geometry pre-testing, GPU-D generates information indicating that object 3 should be rendered by GPU-B, GPU-C and GPU-D and GPU-a may skip rendering of object 3 entirely because object 3 does not overlap with any region allocated to GPU-a for object rendering (e.g., in the corresponding group).

Because the information generated by the geometry pre-test is shared between the GPUs, each GPU can determine which objects to render. Thus, after performing the geometry pre-test and sharing the test results with all GPUs, each GPU has information about which objects or geometries need to be rendered by the corresponding GPU. For example, GPU-A renders object 2; GPU-B renders objects 0, 2, and 3; GPU-C renders objects 1 and 3; and GPU-D render objects 1, 2, and 3.

In particular, GPU a performs geometry processing on object 1 and determines that object 1 may be skipped by GPU-B because object 1 does not overlap with any region allocated to GPU-B for object rendering (e.g., in the corresponding group). Furthermore, object 1 is not completely rendered by GPU-a because it does not overlap with any region allocated to GPU-a for object rendering (e.g., in the corresponding group). Since it is determined that object 1 does not overlap any region allocated to GPU-B before GPU-B begins geometry processing of object 1, GPU-B skips rendering of object 1.

Fig. 8A-8B illustrate object testing for screen regions 820A and 820B, where the screen regions may be staggered (e.g., screen regions 820A and 820B illustrate a portion of a display). In particular, by performing geometry testing prior to rendering an object in a screen, multiple GPU rendering of the object is performed on each of a single image frame or one or more image frames in a sequence of image frames. As shown, GPU-a is assigned responsibility for rendering objects in screen area 820A. GPU-B is assigned responsibility for rendering the object in screen area 820B. Information is generated for a "geometry", where the geometry may be the entire object or a portion of the object. For example, a geometric figure may be an object 810, or a portion of an object 810.

Fig. 8A illustrates object testing of screen regions when multiple GPUs cooperatively render a single image, according to one embodiment of the present disclosure. As previously described, a geometry may be an object such that the geometry corresponds to a geometry used or generated by a corresponding draw call. During the geometry pre-test, it may be determined that object 810 overlaps region 820A. That is, portion 810A of object 810 overlaps region 820A. In this case, the task of GPU-A is to render object 810. Also, during geometric pre-test, it may be determined that object 810 overlaps region 820B. That is, portion 810B of object 810 overlaps region 820B. In this case, the task of GPU-B is also rendering object 810.

Fig. 8B illustrates testing of portions of objects for screen regions and/or screen sub-regions when multiple GPUs cooperatively render a single image frame, according to one embodiment of the present disclosure. That is, the geometry may be part of the object. For example, object 810 may be partitioned into multiple pieces such that the geometry used or generated by a draw call is subdivided into smaller geometries. In one embodiment, each of the geometries is approximately the size of the allocated location cache and/or parameter cache. In that case, information (e.g., one or more hints) is generated for those smaller geometries during geometry testing, where the information is used by the rendering GPU, as previously described.

For example, object 810 is segmented into smaller objects such that the geometry for region testing corresponds to these smaller objects. As shown, object 810 is partitioned into geometric figures "a", "b", "c", "d", "e", and "f". After geometry pre-testing, GPU-A renders only geometry "a", "b", "c", "d" and "e". That is, GPU-A may skip rendering geometry "f". Furthermore, after geometry pre-testing, GPU-B renders only geometry "d", "e", and "f". That is, GPU-B may skip rendering geometries "a", "B", and "c".

In one embodiment, since the geometry processing stage is configured to perform both vertex processing and primitive processing, it is possible to perform geometry pre-testing on one geometry using a shader in the geometry processing stage. For example, the geometry processing stage generates information (e.g., hints), such as by testing the boundary frustum of the geometry for GPU screen regions, which may be performed by a software shader operation. In one embodiment, the test is accelerated by using dedicated instructions or instructions implemented by hardware, thereby implementing a software/hardware solution. That is, one or more specialized instructions are used to accelerate the generation of information about the geometry and its relationship to the screen area. For example, the homogeneous coordinates of the vertices of primitives of a geometry are provided as inputs to a geometry pre-test instruction of a geometry processing stage. The test may generate a boolean return value for each GPU indicating whether the primitive overlaps any screen regions (e.g., in the corresponding group) allocated to that GPU for object rendering. In this way, information (e.g., hints) about the corresponding geometry and its relationship to the screen area generated during geometry pre-testing is generated by the shader in the geometry processing stage.

In another embodiment, geometry pre-testing for one geometry may be performed at the hardware rasterization stage. For example, the hardware scan converter may be configured to perform geometry pre-testing such that the scan converter generates information about all screen regions allocated to the plurality of GPUs for object rendering of the corresponding image frames.

In yet another embodiment, the geometry may be a primitive. That is, the portion of the object that is used for geometry pre-test may be a primitive. Thus, the information (e.g., hints) generated by one GPU during geometry pre-test indicates whether or not the various triangles (e.g., representing primitives) need to be rendered by another rendering GPU.

In one embodiment, the information generated during the geometry pre-test and shared by the GPUs for rendering includes the number of primitives (e.g., surviving primitive counts) that overlap any screen area allocated to the corresponding GPU for object rendering (e.g., in the corresponding group). The information may also include the number of vertices used to construct or define the primitives. That is, this information includes surviving vertex counts. Thus, at rendering time, the corresponding rendering GPU may use the provided vertex count to allocate space in the location cache and the parameter cache. For example, in one embodiment, the unwanted vertices do not have any allocated space, which may increase rendering efficiency.

In other embodiments, the information (e.g., hints) generated during geometry pre-testing includes specific primitives (e.g., surviving primitives as exact matches) that overlap any screen area allocated to the corresponding GPU for object rendering (e.g., in the corresponding group). That is, the information generated for rendering the GPU includes a particular set of primitives for rendering. The information may also include specific vertices used to construct or define the primitives. That is, the information generated for rendering the GPU includes a particular set of vertices for rendering. For example, this information may save other rendering GPU time during its geometry processing stage when rendering geometry.

In other embodiments, there may be processing overhead (software or hardware) associated with generating information during geometry testing. In this case, it may be beneficial to skip generating information for certain geometries. That is, the information provided as hints is generated for some objects and not for other objects. For example, a geometric figure (e.g., an object or a portion of an object) representing a space box or a large topography may include a large triangle. In this case, each GPU for multiple GPU rendering of each of one or more image frames in an image frame or sequence of image frames may need to render these geometries. That is, information may or may not be generated depending on the nature of the corresponding geometry.

Fig. 9A-9C illustrate various policies for assigning screen regions to corresponding GPUs when multiple GPUs cooperatively render a single image, according to one embodiment of the disclosure. To achieve GPU processing efficiency, various techniques may be used in dividing the screen into regions, such as increasing or decreasing the number of regions (e.g., selecting the correct amount of regions), interleaving regions, increasing or decreasing the number of regions for interleaving, selecting a particular mode when interleaving regions, and so forth. For example, the plurality of GPUs are configured to perform multi-GPU rendering of geometry for application-generated image frames by pre-testing the geometry for interlaced screen regions prior to rendering objects in the corresponding images. The configuration of screen regions in fig. 9A-9C is designed to reduce any rendering time imbalance between multiple GPUs. The complexity of the test (e.g., overlapping corresponding screen regions) varies depending on how the screen regions are allocated to the GPU. As shown in the schematic diagrams shown in fig. 9A-9C, bold frame 910 is an outline of a corresponding screen or display used in rendering the image.

In one embodiment, the or each of the plurality of screen regions is of uniform size. In one embodiment, each of the plurality of screen regions is non-uniform in size. In yet another embodiment, the number and size of screen regions in the plurality of screen regions dynamically changes.

In particular, FIG. 9A illustrates a simple mode 900A of screen 910. The size of each screen area is uniform. For example, the size of each region may be a rectangle with a power of 2 pixels in size. For example, each region may be 256×256 pixels in size. As shown, the region allocation is a checkerboard pattern in which one row of A and B regions alternates with another row of B and C regions. The pattern 900A can be easily tested during geometry pre-testing. However, there may be some inefficiency in rendering. For example, the screen area allocated to each GPU is substantially different (i.e., the coverage of screen region C and region D in screen 910 is small), which may result in an imbalance in rendering time for each GPU.

Fig. 9B illustrates a mode 900B of a screen region of the screen 910. The size of each screen or sub-area is uniform. Screen regions are allocated and distributed to reduce the imbalance in rendering time between GPUs. For example, assigning GPUs to screen regions in pattern 900B may result in an approximately equal amount of screen pixels being assigned to each GPU across screen 910. That is, screen regions are allocated to the GPU to equalize screen area or coverage in the screen 910. For example, if each region may be 256x256 pixels in size, each region has approximately the same coverage in screen 910. In particular, a set of screen areas a covers an area of 6×256×256 pixels, a set of screen areas B covers an area of 5.75×256×256 pixels, a set of screen areas C covers an area of 5.5×256×256 pixels, and a set of screen areas D covers an area of 5.5×256×256 pixels.

Fig. 9C illustrates a pattern 900C of screen areas of screen 910. The size of each screen area is not uniform. That is, the screen regions allocated to the GPU for rendering objects may not be uniform in size. In particular, screen 910 is partitioned such that each GPU is assigned to the same number of pixels. For example, if a 4K display (3840 x 2160) is vertically divided into four regions, the height of each region is 520 pixels. However, typically GPUs perform many operations in a 32x32 pixel block, and 520 pixels are not multiples of 32 pixels. Thus, in one embodiment, pattern 900C may include blocks of 512 pixels (multiples of 32) in height, and other blocks of 544 pixels (also multiples of 32) in height. Other embodiments may use blocks of different sizes. The pattern 900C shows an equal number of screen pixels allocated to each GPU by using non-uniform screen regions.

In yet another embodiment, the requirements of the application in performing image rendering vary over time, and the screen area is dynamically selected. For example, if it is known that most of the rendering time is spent on the lower half of the screen, it is advantageous to allocate the area in such a way that almost equal amounts of screen pixels of the lower half of the display are allocated to each GPU for rendering the corresponding image. That is, the area allocated to each GPU for rendering a corresponding image may dynamically change. For example, the change may be applied based on game mode, different games, screen size, mode selected for the region, and so forth.

Fig. 10 is a schematic diagram illustrating various distributions of GPU to geometry assignments for performing geometry pre-testing according to one embodiment of the present disclosure. That is, FIG. 10 shows the distribution of responsibility for generating information during geometry pre-testing between multiple GPUs. As previously described, each GPU is assigned to a corresponding portion of the image frame geometry, where the portion may be further divided into objects, portions of objects, geometries, multiple geometries, and so forth. Geometry pre-testing includes determining whether a particular geometry overlaps any one or more screen regions allocated to a corresponding GPU for object rendering. Geometry pre-testing is typically performed in an implementation by the GPU for all geometries (e.g., all geometries) of the corresponding image frame simultaneously. In this way, geometry testing is performed cooperatively by the GPUs, allowing each GPU to know which geometries to render, and which geometries to skip rendering, as previously described.

As shown in fig. 10, each geometry may be an object, a portion of an object, or the like. For example, the geometry may be part of an object, such as the size of the geometry being approximately the size of the allocated location cache and/or parameter cache, as previously described. Purely for illustration, object 0 (e.g., rendering specified by command 722 in rendering command buffer 700A) is partitioned into geometric figures "a", "B", "c", "d", "e", and "f", such as object 810 in fig. 8B. Likewise, object 1 (e.g., rendering specified by command 724 in rendering command buffer 700A) is partitioned into geometry "g", "h", and "i". Further, object 2 (e.g., rendering specified by command 724 in rendering command buffer 700A) is partitioned into geometries "j", "k", "l", "m", "n", and "o". To distribute responsibility for geometry testing to the GPU, the geometries may be ordered (e.g., a-o).

Distribution 1010 (e.g., abcdbcdabcd..row) shows a uniform distribution of responsibilities for performing geometry testing among multiple GPUs. In particular, rather than having one GPU occupy the first quarter of the geometry (e.g., in one block, such as GPU a occupies the first four of about 16, including "a", "b", "c", and "d" for geometry testing), and a second GPU occupies the second quarter, and so on, the assignments to GPUs are staggered. That is, successive geometries are assigned to different GPUs. For example, geometry "a" is assigned to GPU-A, geometry "B" is assigned to GPU-B, geometry "C" is assigned to GPU-C, geometry "D" is assigned to GPU-D, geometry "e" is assigned to GPU-A, geometry "f" is assigned to GPU-B, geometry "g" is assigned to GPU-C, and so on. As a result, the processing of geometry testing is approximately balanced between GPUs (e.g., GPU-A, GPU-B, GPU-C and GPU-D).

Distribution 1020 (e.g., abbcdabbcdabbcd..row) shows an asymmetric distribution of responsibilities for performing geometry testing between multiple GPUs. An asymmetric distribution may be advantageous when some GPUs have more time to perform geometry testing when rendering corresponding image frames than other GPUs. For example, one GPU may complete object rendering of a previous frame or frames of a scene earlier than other GPUs, so (because it is expected that it will also complete this frame earlier) it may be allocated more geometry for performing geometry testing. Also, the allocation of GPUs is staggered. As shown, GPU-B is assigned more geometry for geometry pre-testing than other GPUs. For example, geometry "a" is assigned to GPU-A, geometry "B" is assigned to GPU-B, geometry "C" is also assigned to GPU-B, geometry "D" is assigned to GPU-C, geometry "e" is assigned to GPU-D, geometry "f" is assigned to GPU-A, geometry "g" is assigned to GPU-B, geometry "h" is also assigned to GPU-B, geometry "i" is assigned to GPU-C, and so on. Although the distribution of geometry tests to GPUs may be unbalanced, the combined processing of the full phases (e.g., geometry pre-test and geometry rendering) may eventually be approximately balanced (e.g., each GPU spends approximately the same amount of time performing geometry pre-test and geometry rendering).

Fig. 11A-11B illustrate the use of statistical data for one or more image frames in assigning responsibility for performing geometry testing between a plurality of GPUs. For example, some GPUs may process more or less geometry during geometry testing based on statistics to generate information useful in rendering.

In particular, fig. 11A is a schematic diagram illustrating geometric pre-testing and rendering of a previous image frame by multiple GPUs, and using statistical data collected during rendering at a current image frame to affect the distribution of pre-testing of the geometry of the current image frame to the multiple GPUs, according to one embodiment of the present disclosure. Purely by way of illustration, in the second frame 1100B of FIG. 11A, the geometry of the GPU-B process (e.g., during pre-test) is twice that of the other GPUs (e.g., GPU-A, GPU-C and GPU-D). The distribution and allocation of more geometry to GPU-B to perform geometry pre-testing in the current image frame is based on statistical data collected during rendering of the previous image frame or frames.

For example, timing diagram 1100A shows geometry pre-test 701A and rendering 702A for a previous image frame, where four GPUs (e.g., GPU-A, GPU-B, GPU-C and GPU-D) are used for two processes. The distribution of the geometry (e.g., multiple geometries) of the previous image frame is evenly distributed among the GPUs. The approximately balanced performance of each GPU against geometry pre-test 701A indicates this.

Rendering statistics collected from one or more image frames may be used to determine how to perform geometry testing and rendering of the current image frame. That is, statistical data may be provided as information used in performing geometry testing and rendering of subsequent image frames (e.g., current image frames). For example, statistics collected during rendering of objects (e.g., geometry) of a previous image frame may indicate that GPU-B completed rendering earlier than other GPUs. In particular, GPU-B has an idle time 1130A after rendering its portion of geometry that overlaps any screen area allocated to GPU-B for object rendering (e.g., in a corresponding set). Each of the other GPUs-A, GPU-C and GPU-D performs rendering approximately until the end 710 of the corresponding frame period of the previous image frame.

When an application is executed, a previous image frame and a current image frame may be generated for a particular scene. Thus, objects from one scene to another may be substantially similar in number and location. In this case, the time to perform geometry pre-test and rendering for the GPU will be similar between the multiple image frames in the sequence of image frames. That is, based on statistical data, reasonable speculation, GPU-B also has free time in performing geometry testing and rendering in the current image frame. Thus, GPU-B may be allocated more geometry for geometry pre-testing in the current frame. For example, by letting GPU-B process more geometry during geometry pre-testing, the result is that GPU-B completes at about the same time as other GPUs after rendering objects in the current image frame. That is, each of GPU-A, GPU-B, GPU-C and GPU-D performs rendering approximately until the end 711 of the corresponding frame period of the current image frame. In one embodiment, the total time to render the current image frame is reduced, such that it takes less time to render the current image frame when using rendering statistics. Thus, statistics of the rendering of the previous frame and/or previous multiframe may be used to adjust geometry pre-testing, such as distributing the distribution of geometry (e.g., multiple geometries) among GPUs in the current image frame.

Fig. 11B is a flow chart 1100B illustrating a graphics processing method according to one embodiment of the present disclosure, including geometric pre-testing and rendering of a previous image frame by a plurality of GPUs, and using statistical data collected during rendering in a current image frame to affect the pre-testing of the geometric shapes of the current image frame to be distributed to the plurality of GPUs. FIG. 11A is a schematic diagram illustrating the use of statistics in the method of flowchart 1100B to determine a distribution of geometric (e.g., multiple geometric) allocations among GPUs for an image frame. As previously described, the various architectures may include multiple GPUs that cooperate to render a single image by performing multiple GPU rendering of geometry for an application, such as within one or more cloud game servers of a cloud game system, or within a stand-alone system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs), and so forth.

In particular, at 1110, the method includes rendering graphics for an application using a plurality of GPUs, as previously described. At 1120, the method includes dividing responsibility for rendering geometry of the graphics among the plurality of GPUs based on the plurality of screen regions. Each GPU has a corresponding division of responsibility known to the multiple GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen regions of the plurality of screen regions, wherein the corresponding set of screen regions includes one or more screen regions, as previously described. In one embodiment, the screen regions are staggered (e.g., when the display is divided into groups of screen regions for geometric pre-testing and rendering).

At 1130, the method includes rendering, at the plurality of GPUs, a first plurality of geometric figures of a previous image frame generated by the application. For example, timing diagram 1100A illustrates a timing of performing geometry testing of geometry and rendering of objects (e.g., geometry) in a previous image frame. At 1140, the method includes generating statistics for the rendering of the previous image frame. That is, statistics may be collected while rendering a previous image frame.

At 1150, the method includes assigning a second plurality of geometries of the current image frame generated by the application to the plurality of GPUs for geometry testing based on the statistics. That is, these statistics may be used to assign the same, fewer, or more geometric figures for geometric figure testing to a particular GPU when rendering the next or current image frame. In some cases, the statistics may indicate that the geometry of the second plurality of geometries should be evenly distributed to the plurality of GPUs when performing the geometry test.

In other cases, the statistics may indicate that the geometry of the second plurality of geometries should be unevenly distributed to the plurality of GPUs when performing the geometry test. For example, as shown in timeline 1100A, the statistics may indicate that GPU-B completed rendering before any other GPUs in the previous image frame. In particular, it may be determined that a first GPU (e.g., GPU-B) completes rendering a first plurality of geometries (e.g., a portion of the geometries) before a second GPU (e.g., GPU-a) completes rendering the first plurality of geometries. As previously described, a first GPU (e.g., GPU-B) renders one or more of the first plurality of geometries that overlap any screen area allocated to the first GPU for object rendering, and a second GPU (e.g., GPU-a) renders one or more of the first plurality of geometries that overlap any screen area allocated to the second GPU for object rendering. Thus, because it is expected that the first GPU (e.g., GPU-B) will require less time to render the second plurality of geometries than the second GPU (e.g., GPU-a) based on the statistics, more geometries may be allocated to the first GPU for geometry pre-testing when rendering the current image frame. For example, a first number of second plurality of geometries may be assigned to a first GPU (e.g., GPU-B) for geometry testing, and a second number of second plurality of geometries may be assigned to a second GPU (e.g., GPU-a) for geometry testing, where the first number is higher than the second number (if the time imbalance is large enough, GPU-a may not be assigned any geometries at all). In this way, GPU-B processes more geometry during geometry testing than GPU-A. For example, timing diagram 1100B shows that GPU-B has been assigned more geometry and spends more time performing geometry testing than other GPUs.

At 1160, the method includes performing a geometry pre-test on the second plurality of geometries at the current image frame to generate information about each of the second plurality of geometries and its relationship to each of the plurality of screen regions. Geometric pre-testing is performed at each of the plurality of GPUs based on the allocation. A geometry pre-test is performed at the pre-test GPU on a plurality of geometries of an image frame generated by the application to generate information about each geometry and its relationship to each of the plurality of screen regions.

At 1170, the method includes rendering the plurality of geometries using the information generated for each of the second plurality of geometries during a rendering phase (e.g., including fully rendering the geometry at the corresponding GPU or skipping the rendering of the geometry). In an embodiment, rendering is typically performed simultaneously at each GPU. In particular, the plurality of geometries of the current image frame are rendered at each of the plurality of GPUs using information generated for each of the geometries.

In other embodiments, distributing the geometry to the GPU for generating the information is dynamically adjusted. That is, the allocation of geometry for the current image frame used to perform the geometry pre-test may be dynamically adjusted during rendering of the current image frame. For example, in the example of timing diagram 1100B, it may be determined that GPU-a is performing geometry pre-testing on its assigned geometry at a slower rate than expected. Thus, the geometry assigned to GPU-a for geometry pretesting may be reassigned instantaneously, such as reassigning a geometry from GPU-a to GPU-B, so that GPU-B now has the task of performing geometry pretesting on the geometry during the frame period used to render the current image frame.

Fig. 12A-12B illustrate another strategy for processing a render command buffer. Previously, a strategy was described with reference to fig. 7A-7C, wherein a command buffer contains commands for geometry pre-testing an object (e.g., a geometry) followed by commands for rendering the object (e.g., geometry). FIGS. 12A-12B illustrate geometry pre-test and rendering strategies using shaders capable of performing either operation according to a GPU configuration.

In particular, fig. 12A is a schematic diagram illustrating the use of a shader configured to perform both pre-testing and rendering of geometry of an image frame in two passes through a portion of command buffer 1200A, according to one embodiment of the disclosure. That is, the shader that is used to execute the commands in command buffer 1200A may be configured to perform geometry pre-testing when properly configured, or to perform rendering when properly configured.

As shown, a portion of the command buffer 1200A shown in fig. 12A is performed twice, each performing resulting in a different action; the first execution results in execution of a geometry pre-test, while the second execution results in execution of a geometry rendering. This may be accomplished in a number of ways, for example, the portion of the command buffer described in 1200A may be explicitly called twice as a subroutine, with different states (e.g., register settings or values in RAM) explicitly set to different values prior to each call. Alternatively, the portion of the command buffer described in 1200A may be implicitly executed twice, for example by marking the beginning and end of the portion with a special command to execute twice, and implicitly setting different configurations (e.g., register settings) for the first and second execution of the portion of the command buffer. When a command in a portion of the command buffer 1200A is executed (e.g., a command to set a state or a command to execute a shader), the results of the command are different based on the GPU state (e.g., resulting in execution of geometry pre-test and execution of rendering). That is, commands in command buffer 1200A may be configured for geometry pre-testing or rendering. In particular, portions of command buffer 1200A include commands for configuring the state of one or more GPUs executing commands from rendering command buffer 1200A, and commands for executing shaders that perform geometry pre-testing or rendering depending on the state. For example, each of commands 1210, 1212, 1214, and 1216 is used to configure the state of one or more GPUs to execute shaders that perform geometry pre-testing or rendering depending on the state. As shown, command 1210 configures the GPU state so that shader 0 can execute and perform geometry pre-test or rendering via command 1211. Likewise, command 1212 configures the GPU state so that shader 1 may execute via command 1213 to perform geometry pre-testing or rendering. In addition, command 1214 configures the GPU state so that shader 2 may execute via command 1215 to perform geometry pre-testing or rendering. Finally, command 1216 configures the GPU state so that shader 3 may execute via command 1217 to perform geometry pre-testing or rendering.

On the first pass 1291 through command buffer 1200A, the corresponding shader performs geometry pre-testing based on the GPU states explicitly or implicitly set as described above and the GPU states configured by commands 1210, 1212, 1214, and 1216. For example, shader 0 is configured to perform geometry pre-testing on object 0 (e.g., one geometry) (e.g., based on the object shown in fig. 7B-1), shader 1 is configured to perform geometry pre-testing on object 1, shader 2 is configured to perform geometry pre-testing on object 2, and shader 3 is configured to perform geometry pre-testing on object 3.

In one implementation, commands may be skipped or interpreted differently based on GPU state. For example, certain commands (portions of 1210, 1212, 1214, and 1216) to set state may be skipped based on the GPU state explicitly or implicitly set as described above; for example, if configuring shader 0 executing via command 1210 requires less GPU state to be configured for geometry pre-test than is required when configuring for geometry rendering, it may be beneficial to skip setting unnecessary portions of GPU state because GPU state setting may create overhead. As another example, certain commands (portions of 1210, 1212, 1214, and 1216) to set state may be interpreted differently based on GPU state set explicitly or implicitly as described above; for example, if shader 0 executing via command 1210 needs a different GPU state configured for geometry pre-testing than when configured for geometry rendering, or if shader 0 executing via command 1210 needs different inputs for geometry pre-testing and for geometry rendering.

In one embodiment, the shader configured for geometry pre-testing does not allocate space in the location cache and the parameter cache, as previously described. In another embodiment, a single shader is used to perform the pre-test or rendering. This may be done in a number of ways, such as via an external hardware state that the shader can check (e.g., explicitly or implicitly set as described above), or via input by the shader (e.g., set by commands interpreted differently in the first and second passes through the command buffer).

In a second pass 1292 through command buffer 1200A, the corresponding shader performs rendering of the geometry of the corresponding image frame based on the GPU state explicitly or implicitly set as described above, as well as the GPU state configured by commands 1210, 1212, 1214, and 1216. For example, shader 0 is configured to perform rendering of object 0 (e.g., a geometry) (e.g., based on the object shown in fig. 7B-1). Also, the shader 1 is configured to perform rendering of the object 1, the shader 2 is configured to perform rendering of the object 2, and the shader 3 is configured to perform rendering of the object 3.

Fig. 12B is a flow chart 1200B illustrating a graphics processing method according to one embodiment of the present disclosure, including performing both pre-testing and rendering of the geometry of an image frame using the same set of shaders in two passes through a portion of a command buffer. As previously described, the various architectures may include multiple GPUs that cooperate to render a single image by performing multiple GPU rendering of geometry for an application, such as within one or more cloud game servers of a cloud game system, or within a stand-alone system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs), and so forth.

In particular, at 1210, the method includes rendering graphics for an application using a plurality of GPUs, as previously described. At 1220, the method includes dividing responsibility for rendering geometry of the graphics among the plurality of GPUs based on the plurality of screen regions. Each GPU has a corresponding division of responsibility known to the multiple GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen regions of the plurality of screen regions, wherein the corresponding set of screen regions includes one or more screen regions, as previously described. In one embodiment, the screen regions are staggered (e.g., when the display is divided into groups of screen regions for geometric pre-testing and rendering).

At 1230, the method includes assigning a plurality of geometries of the image frame to the plurality of GPUs for geometry testing. In particular, each of the plurality of GPUs is assigned to a corresponding portion of geometry associated with an image frame for geometry testing. As previously mentioned, in embodiments, the distribution of the geometric figures may be uniform or non-uniform, wherein each portion includes one or more geometric figures, or there may be no geometric figures at all.

At 1240, the method includes loading a first GPU state configuring one or more shaders to perform geometry pre-testing. For example, depending on the GPU state, the corresponding shader may be configured to perform different operations. Thus, the first GPU state configures the corresponding shader to perform geometry pre-testing. In the example of fig. 12A, this may be set in a variety of ways, such as by setting the state explicitly or implicitly outside of the portion of the command buffer depicted in 1200A, as described above. In particular, the GPU state may be set in a variety of ways. For example, the CPU or GPU may set a value in Random Access Memory (RAM) where the GPU will check the value in RAM. In another example, the state may be internal to the GPU, such as when the command buffer is called twice as a subroutine, the internal GPU state being different between the two subroutine calls. Alternatively, the command 1210 in fig. 12A may be interpreted or skipped differently based on the state set explicitly or implicitly as described above. Based on this first GPU state, shader 0, executed by command 1211, is configured to perform geometry pre-testing.

At 1250, the method includes performing a geometry pre-test on the plurality of geometries at the plurality of GPUs to generate information about each geometry and its relationship to each of the plurality of screen regions. As previously described, the geometry pre-test may determine whether a geometry overlaps any screen regions (e.g., in a corresponding group) allocated to the corresponding GPU for object rendering. Because geometry pre-testing is typically performed in an implementation by the GPUs for all geometries of the corresponding image frame simultaneously, each GPU is able to know which geometries to render and which geometries to skip. This concludes the first traversal through the command buffer, where the shader can be configured to perform each of the geometry pre-test and/or rendering according to the GPU state.

At 1260, the method includes loading a second GPU state that configures one or more shaders to perform rendering. As previously described, depending on the GPU state, the corresponding shader may be configured to perform different operations. Thus, the second GPU state configures the corresponding shader (the same shader that was previously used to perform geometry pre-testing) to perform rendering. In the example of fig. 12A, based on the second GPU state, shader 0, executed by command 1211, is configured to perform rendering.

At 1270, the method includes using information generated for each of the plurality of geometries at each of the plurality of GPUs when rendering the plurality of geometries (e.g., including fully rendering the geometry at the corresponding GPU or skipping the rendering of the geometry). As previously described, this information may indicate whether a geometry overlaps any screen region allocated to the corresponding GPU for object rendering (e.g., in the corresponding group). This information may be used to render each of the plurality of geometric figures at each of the plurality of GPUs such that each GPU may efficiently render only geometric figures that overlap with at least one screen (e.g., in a corresponding group) assigned to the corresponding GPU for object rendering. This concludes a second pass through the command buffer, where the shader may be configured to perform each of the geometry pre-test and/or rendering according to the GPU state.

Fig. 13A-13B illustrate another strategy for processing a render command buffer. Previously, a strategy was described with reference to fig. 7A-7C, wherein the command buffer contains commands for geometry pre-testing an object (e.g., a geometry) followed by commands for rendering the object (e.g., geometry), and another strategy using a shader capable of performing either operation according to the GPU configuration is described in fig. 12A-12B. 13A-13B illustrate geometry testing and rendering strategies using shaders capable of performing geometry pre-testing or rendering, and where the process of geometry pre-testing and rendering is staggered for different sets of geometries, according to embodiments of the present disclosure.

In particular, fig. 13A is a schematic diagram illustrating the use of shaders configured to perform both geometry pre-testing and rendering, where the geometry pre-testing and rendering performed for different sets of geometries are interleaved using separate portions of a corresponding command buffer 1300A, according to one embodiment of the present disclosure. That is, rather than executing a portion of the command buffer 1300A from beginning to end, the command buffer 1300A is dynamically configured and executed such that geometry pre-tests and renderings are interleaved for different groups of geometries. For example, in a command buffer, some shaders (e.g., executed via commands 1311 and 1313) are configured to perform geometry pre-testing on a first set of geometries, where after performing geometry testing, those same shaders (e.g., executed by commands 1311 and 1313) are then configured to perform rendering. After performing rendering on the first set of geometries, other shaders in the command buffer (e.g., performed via commands 1315 and 1317) are configured to perform geometry pre-testing on the second set of geometries, wherein after performing geometry pre-testing, those same shaders (e.g., performed via commands 1315 and 1317) are then configured to perform rendering, and rendering is performed on the second set of geometries using those commands. The benefit of this strategy is that the imbalance between GPUs can be dynamically resolved, such as by using asymmetric interleaving of geometry tests throughout the rendering process. An example of asymmetric interleaving of geometric tests is previously described in the distribution 102 of fig. 10.

Since the geometry pre-test and the interleaving of rendering occurs dynamically, the configuration of the GPU occurs implicitly (e.g., via register settings or values in RAM), that is, one aspect of the GPU configuration occurs outside of the command buffer. For example, the GPU registers may be set to 0 (indicating that geometry pre-testing should occur) or 1 (indicating that rendering should occur); the interleaving traversal of the command buffers and the setting of this register may be controlled by the GPUs based on the number of objects processed, primitives processed, imbalances between GPUs, etc. Alternatively, values in RAM may be used. As a result of this external configuration (meaning set outside of the command buffer), when a command in a portion of the command buffer 1300A is executed (e.g., a command to set state or a command to execute a shader), the result of the command is different based on the GPU state (e.g., resulting in execution of geometry pre-test and execution of rendering). That is, commands in the command buffer 1300A may be configured for geometry pre-test 1391 or rendering 1392. In particular, the portion of command buffer 1300A includes commands for configuring the state of one or more GPUs executing commands from rendering command buffer 1300A, and commands for executing shaders that perform geometry pre-testing or rendering depending on the state. For example, each of commands 1310, 1312, 1314, and 1316 is used to configure the state of the GPU in order to execute a shader that performs geometry pre-testing or rendering depending on the state. As shown, command buffer 1310 configures GPU states so that shader 0 may be executed via command 1311 to perform geometry pre-testing or rendering of object 0. Likewise, command buffer 1312 configures the GPU state so that shader 1 may be executed via command 1313 to perform geometry pre-testing or rendering of object 1. Likewise, command buffer 1314 configures the GPU state so that shader 2 may be executed via command 1315 to perform geometry pre-testing or rendering of object 2. Further, command buffer 1316 configures the GPU state so that shader 3 may be executed via commands 1317 to perform geometry pre-testing or rendering of object 3.

The geometry pre-test and rendering may be staggered for different sets of geometries. For illustration purposes only, command buffer 1300A may be configured to first perform geometry pre-testing and rendering of objects 0 and 1, and then command buffer 1300A is configured to second perform geometry pre-testing and rendering of objects 2 and 3. It will be appreciated that different numbers of geometric figures may be interleaved in different segments. For example, segment 1 shows a first traversal through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1310 and 1312, the corresponding shader performs geometry pre-testing. For example, shader 0 is configured to perform geometry pre-testing on object 0 (e.g., one geometry) (e.g., based on the object shown in fig. 7B-1), and shader 1 is configured to perform geometry pre-testing on object 1. Segment 2 shows a second traversal through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1310 and 1312, the corresponding shader performs rendering. For example, shader 0 is configured to now perform rendering of object 0, and shader 1 is configured to now perform rendering of object 1.

The interleaving of geometry pre-test and performance of rendering on different sets of geometries is shown in fig. 13A. In particular, segment 3 shows a third partial traversal through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1314 and 1316, the corresponding shader performs geometry pre-testing. For example, shader 2 (executing via command 1315) performs a geometry test on object 2 (e.g., a geometry) (e.g., based on the object shown in fig. 7B-1), and shader 3 (executing via command 1317) performs a geometry test on object 3. Segment 4 shows a fourth partial traversal through command buffer 1300A. Based on the GPU state implicitly set as described above and the GPU state configured by commands 1314 and 1316, the corresponding shader performs rendering. For example, shader 2 (executing via command 1315) performs rendering of object 2, and shader 3 (executing via command 1317) performs rendering of object 3.

Note that the hardware context is preserved, or saved and restored. For example, the geometry pre-test GPU background at the end of segment 1 needs to be used to perform geometry pre-test at the beginning of segment 3. Likewise, rendering the GPU context at the end of segment 2 requires rendering to be performed at the beginning of segment 4.

In one implementation, commands may be skipped or interpreted differently based on GPU state. For example, certain commands (portions of 1310, 1312, 1314, and 1316) to set state may be skipped based on the GPU state implicitly set as described above; for example, if configuring shader 0 executing via command 1310 requires less GPU state to be configured for geometry testing than is required when configuring for geometry rendering, it may be beneficial to skip setting unnecessary portions of the GPU state because GPU state setting may incur overhead. As another example, certain commands (portions of 1310, 1312, 1314, and 1316) to set state may be interpreted differently based on GPU state implicitly set as described above; for example, if shader 0 executing via command 1310 needs a GPU state configured for geometry testing that is different from that configured for geometry rendering, or if shader 0 executing via command 1310 needs different inputs for geometry testing and for geometry rendering.

In one embodiment, the shader configured for geometry pre-testing does not allocate space in the location cache and the parameter cache, as previously described. In another embodiment, a single shader is used to perform the pre-test or rendering. This may be done in a number of ways, such as via an external hardware state that the shader can check (e.g., implicitly set as described above), or via input by the shader (e.g., set by commands interpreted differently in the first and second passes through the command buffer).

Fig. 13B is a flow chart illustrating a graphics processing method according to one embodiment of the present disclosure, including interleaving pre-testing and rendering of the geometry of an image frame for different sets of geometries using separate portions of corresponding command buffers. As previously described, the various architectures may include multiple GPUs that cooperate to render a single image by performing multiple GPU rendering of geometry for an application, such as within one or more cloud game servers of a cloud game system, or within a stand-alone system (such as a personal computer or game console that includes a high-end graphics card with multiple GPUs), and so forth.

In particular, at 1310, the method includes rendering graphics for an application using a plurality of GPUs, as previously described. At 1320, the method includes dividing responsibility for rendering geometry of the graphics among the plurality of GPUs based on the plurality of screen regions. Each GPU has a corresponding division of responsibility known to the multiple GPUs. More specifically, each GPU is responsible for rendering geometry in a corresponding set of screen regions of the plurality of screen regions, wherein the corresponding set of screen regions includes one or more screen regions, as previously described. In one embodiment, the screen regions are staggered (e.g., when the display is divided into groups of screen regions for geometric pre-testing and rendering).

At 1330, the method includes assigning a plurality of geometries of the image frame to the plurality of GPUs for geometry testing. In particular, each of the plurality of GPUs is assigned to a corresponding portion of geometry associated with an image frame for geometry testing. As previously mentioned, the distribution of the geometric figures may be uniform or non-uniform, wherein each portion includes one or more geometric figures, or there may be no geometric figures at all.

At 1340, the method includes interleaving a first set of shaders with a second set of shaders in a command buffer, wherein the shaders are configured to perform both geometry pre-testing and rendering. In particular, the first set of shaders is configured to perform geometry pre-testing and rendering on the first set of geometries. Thereafter, the second set of shaders is configured to perform geometry pre-testing and rendering on the second set of geometries. As previously described, the geometry pre-test generates corresponding information about each geometry in the first or second set and its relationship to each of the plurality of screen regions. The plurality of GPUs render each geometry in the first or second group using the corresponding information. As previously described, the GPU state may be set in a variety of ways to perform geometry pre-testing or rendering. For example, the CPU or GPU may set a value in Random Access Memory (RAM) where the GPU will check the value in RAM. In another example, the state may be internal to the GPU, such as when the command buffer is called twice as a subroutine, the internal GPU state being different between the two subroutine calls.

The interleaving process is further described. In particular, the first set of shaders of the command buffer are configured to perform geometry pre-testing on the first set of geometries, as previously described. A geometry pre-test is performed on the first set of geometries at the plurality of GPUs to generate first information about each geometry in the first set and its relationship to each of the plurality of screen regions. The first set of shaders is then configured to perform rendering of the first set of geometry, as previously described. Thereafter, the first information is used when rendering the plurality of geometries at each of the plurality of GPUs (e.g., including fully rendering the first set of geometries at the corresponding GPU or skipping the rendering of the first set of geometries). As previously described, this information indicates which geometries overlap with the screen regions allocated to the corresponding GPUs for object rendering. For example, when the information indicates that the geometry does overlap with any screen region allocated to the GPU for object rendering (e.g., in a corresponding set), the information may be used to skip rendering the geometry at the GPU.

The second set of shaders is then used for geometry testing and rendering of the second set of geometry. In particular, the second set of shaders of the command buffer is configured to perform geometry pre-testing on the second set of geometries, as previously described. Then, a geometry test is performed on the second set of geometries at the plurality of GPUs to generate second information about each geometry in the second set and its relationship to each of the plurality of screen regions. The second set of shaders is then configured to perform rendering of the second set of geometry, as previously described. Thereafter, rendering of the second set of geometries is performed at each of the plurality of GPUs using the second information. As previously described, this information indicates which geometries overlap with the screen regions (e.g., of the corresponding group) allocated to the corresponding GPUs for object rendering.

Although multiple GPUs are described above as processing geometry immediately after each other (i.e., multiple GPUs perform geometry pre-testing and then multiple GPUs perform rendering), in some embodiments, GPUs are not explicitly synchronized with each other, e.g., one GPU may be rendering a first set of geometries and a second GPU is performing geometry pre-testing on a second set of geometries.

Fig. 14 illustrates components of an example apparatus 1400 that can be used to perform aspects of various embodiments of the disclosure. For example, fig. 14 illustrates an exemplary hardware system suitable for multiple GPU rendering of geometry for an application by pre-testing the geometry for a screen region that may be interlaced prior to rendering an object of an image frame, according to an embodiment of the present disclosure. The block diagram illustrates a device 1400 that may be incorporated into or may be a personal computer, a server computer, a game console, a mobile device, or other digital device, each of which is suitable for practicing an embodiment of the invention. The apparatus 1400 includes a Central Processing Unit (CPU) 1402 for running software applications and optionally an operating system. CPU 1402 may be comprised of one or more homogeneous or heterogeneous processing cores.

According to various embodiments, CPU 1402 is one or more general-purpose microprocessors having one or more processing cores. Further embodiments may be implemented using one or more CPUs having a microprocessor architecture particularly suited for highly parallel and computationally intensive applications such as media and interactive entertainment applications configured for graphics processing during execution of games.

Memory 1404 stores applications and data for use by CPU 1402 and GPU 1416. Storage 1406 provides non-volatile storage and other computer readable media for applications and data and may include fixed disk drives, removable disk drives, flash memory devices and CD-ROMs, DVD-ROMs, blu-ray discs, HD-DVDs, or other optical storage devices, as well as signal transmission and storage media. User input devices 1408 communicate user input from one or more users to device 1400, examples of which may include a keyboard, mouse, joystick, touchpad, touch screen, still or video recorder/camera, and/or microphone. The network interface 1409 allows the device 1400 to communicate with other computer systems via electronic communication networks, and may include wired or wireless communications over local area networks and wide area networks such as the internet. The audio processor 1412 is adapted to generate analog or digital audio output from instructions and/or data provided by the CPU 1402, memory 1404, and/or storage device 1406. The components of device 1400, including CPU 1402, a graphics subsystem including GPU 1416, memory 1404, data storage 1406, user input device 1408, network interface 1409, and audio processor 1412, are connected via one or more data buses 1422.

Graphics subsystem 1414 is further connected with data bus 1422 and the components of device 1400. Graphics subsystem 1414 includes at least one Graphics Processing Unit (GPU) 1416 and a graphics memory 1418. The graphics memory 1418 includes a display memory (e.g., a frame buffer) for storing pixel data for each pixel of the output image. Graphics memory 1418 may be integrated in the same device as GPU 1416, connected with GPU 1416 as a separate device, and/or implemented within memory 1404. Pixel data may be provided directly from CPU 1402 to graphics memory 1418. Alternatively, the CPU 1402 may provide data and/or instructions defining a desired output image to the GPU 1416, from which the GPU 1416 generates pixel data for one or more output images. Data and/or instructions defining a desired output image may be stored in memory 1404 and/or graphics memory 1418. In an embodiment, the GPU 1416 includes 3D rendering capabilities for generating pixel data of an output image from instructions and data defining geometry, lighting, shading, texturing, motion, and/or camera parameters of a scene. The GPU 1416 may also include one or more programmable execution units capable of executing shader programs.

The graphics subsystem 1414 periodically outputs pixel data for an image from graphics memory 1418 for display on display device 1410 or projection by a projection system (not shown). Display device 1410 may be any device capable of displaying visual information in response to a signal from device 1400, including CRT, LCD, plasma, and OLED displays. The device 1400 may provide, for example, analog or digital signals to the display device 1410.

Other embodiments for optimizing graphics subsystem 1414 may include multiple GPU rendering of geometry for an application by pre-testing the geometry for possible interlaced screen regions prior to rendering objects of an image frame. Graphics subsystem 1414 may be configured as one or more processing devices.

For example, in one embodiment, graphics subsystem 1414 may be configured to perform multi-GPU rendering of geometry for an application, where multiple graphics subsystems may implement graphics and/or rendering pipelines for a single application. That is, the graphics subsystem 1414 includes a plurality of GPUs for rendering each of one or more images in an image or sequence of images when the application is executed.

In other embodiments, graphics subsystem 1414 includes multiple GPU devices that are combined to perform graphics processing for a single application executing on a corresponding CPU. For example, multiple GPUs may perform multiple GPU rendering of geometry for an application by pre-testing the geometry for screen regions that may be interlaced prior to rendering objects of an image frame. In other examples, multiple GPUs may perform an alternating form of frame rendering, where in sequential frame periods, GPU 1 renders a first frame, and GPU 2 renders a second frame, and so on, until the last GPU is reached, whereupon the initial GPU renders the next video frame (e.g., if there are only two GPUs, GPU 1 renders a third frame). That is, the GPU rotates when rendering the frame. Rendering operations may overlap, where GPU 2 may begin rendering the second frame before GPU 1 completes rendering the first frame. In another embodiment, multiple GPU devices may be assigned different shader operations in the rendering and/or graphics pipeline. The master GPU is performing master rendering and compositing. For example, in a group of three GPUs, master GPU 1 may perform a master rendering (e.g., a first shader operation) and synthesize output from slave GPU 2 and slave GPU 3, where slave GPU 2 may perform a second shader (e.g., a fluid effect such as a river) operation and slave GPU 3 may perform a third shader (e.g., a particle smoke) operation, where master GPU 1 synthesizes results from each of GPU 1, GPU 2, and GPU 3. In this way, different GPUs may be allocated to perform different shader operations (e.g., flag swing, wind, aerosol generation, fire, etc.) to render video frames. In yet another embodiment, each of the three GPUs may be assigned to a different object and/or scene portion corresponding to a video frame. In the above embodiments and implementations, these operations may be performed in the same frame period (concurrent parallel) or in different frame periods (sequential parallel).

Accordingly, the present disclosure describes methods and systems configured to perform multi-GPU rendering of geometry for an application by pre-testing the geometry for a screen region that may be interlaced prior to rendering an object of each of one or more image frames or a sequence of image frames when the application is executed.

It is to be understood that the various embodiments defined herein can be combined or assembled into specific embodiments using the various features disclosed herein. Thus, the examples provided are only some of the possible examples and are not limited to the various embodiments that are possible by combining various elements to define further embodiments. In some examples, some embodiments may include fewer elements without departing from the spirit of the disclosed or equivalent embodiments.

Embodiments of the present disclosure may be practiced with various computer system configurations, including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. Embodiments of the disclosure may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a wired or wireless network.

In view of the above, it should be appreciated that embodiments of the present disclosure may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Any of the operations described herein that form part of the embodiments of the present disclosure are useful machine operations. Embodiments of the present disclosure also relate to an apparatus or device for performing these operations. The apparatus may be specially constructed for the required purposes, or it may be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.

The present disclosure may also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data which can be thereafter be read by a computer system. Examples of computer readable media include hard disk drives, network Attached Storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-R, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. The computer readable medium can comprise a computer readable tangible medium distributed over a network coupled computer system such that the computer readable code is stored and executed in a distributed fashion.

Although the method operations are described in a particular order, it should be understood that other housekeeping operations may be performed between the operations, or the operations may be adjusted so that they occur at slightly different times, or may be distributed in a system that allows processing operations to occur at various intervals associated with the processing, so long as the processing that overlays the operations is performed in the desired manner.

Although the foregoing disclosure has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the embodiments of the disclosure are not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Claims

1. A method for graphics processing, comprising:

rendering graphics for an application using a plurality of Graphics Processing Units (GPUs);

dividing responsibilities for rendering geometry of the graphics between a plurality of GPUs based on a plurality of screen regions, each GPU having a corresponding division of the responsibilities known to the plurality of GPUs, wherein screen regions of the plurality of screen regions are staggered;

Assigning the geometry of the image frames generated by the application to a GPU for geometry pre-testing;

performing the geometry pre-test at the GPU to generate information about the geometry and its relationship to each of the plurality of screen areas;

using the information at each of the plurality of GPUs in rendering the image frame;

providing the information as a hint to a rendering GPU, wherein the rendering GPU is one of the plurality of GPUs,

wherein if the information is received before the rendering GPU begins rendering the geometry, the rendering GPU considers the information,

wherein the geometry is fully rendered at the rendering GPU when the information is received after rendering of the geometry begins.

2. The method of claim 1, further comprising:

skipping rendering of the geometry at the rendering GPU when the information indicates that the geometry does not overlap any screen area allocated to the rendering GPU for object rendering, wherein the rendering GPU is one of the plurality of GPUs.

3. The method of claim 1, further comprising:

assigning a plurality of geometries of the image frames to the plurality of GPUs for the geometry pre-test,

Wherein the geometries of the plurality of geometries are uniformly or non-uniformly distributed among the plurality of GPUs.

4. The method of claim 3, wherein the plurality of geometries are allocated such that consecutive geometries are processed by different GPUs.

5. The method of claim 4, wherein a first GPU performs the geometry pre-test on more geometry than a second GPU, or the first GPU performs the geometry pre-test while the second GPU does not perform geometry pre-test at all.

6. The method of claim 1, wherein the plurality of screen regions are configured to reduce an imbalance in rendering time between the plurality of GPUs.

7. The method of claim 1, wherein each of the plurality of screen regions is non-uniform in size.

8. The method of claim 1, wherein the plurality of screen regions dynamically change.

9. The method of claim 1, wherein the geometry corresponds to a geometry used or generated by a draw call.

10. The method according to claim 1,

wherein the geometry used or generated by the draw call of the application is subdivided into a plurality of geometries including the geometry for which the GPU generated the information.

11. The method of claim 1, wherein the geometry is a separate primitive.

12. The method of claim 1, wherein the information about the geometry comprises a vertex count or a primitive count.

13. The method of claim 1, wherein the information about the geometry comprises a particular set of primitives for rendering or a particular set of vertices for rendering.

14. The method of claim 1, further comprising:

using a common rendering command buffer for the plurality of GPUs; and

limiting execution of commands in the common rendering command buffer to one or more of the plurality of GPUs.

15. The method according to claim 1,

wherein the information may or may not be generated depending on the nature of the geometry.

16. The method of claim 1, further comprising:

the information is generated in a rasterization stage using a scan converter.

17. The method of claim 1, further comprising:

the information is generated in a geometry processing stage using one or more shaders.

18. The method of claim 17, wherein the one or more shaders use one or more dedicated instructions to accelerate generation of information.

19. The method of claim 17, wherein the one or more shaders do not perform an allocation of a location cache or a parameter cache.

20. The method of claim 1, further comprising:

dividing a plurality of geometries among the plurality of GPUs for the geometry pre-test,

wherein successive geometries are handled by different GPUs.

21. The method of claim 20, further comprising:

the partitioning of the plurality of geometric figures is dynamically adjusted based on the performance of each of the plurality of GPUs during the geometric figure pre-test.

22. The method according to claim 1,

wherein one or more of the plurality of GPUs are part of a larger GPU configured as a plurality of virtual GPUs.

23. A computer system, comprising:

a processor;

a memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to perform a method for implementing a graphics pipeline, the method comprising:

Assigning the geometry of the image frames generated by the application to the GPU for geometry pre-testing;

24. The computer system of claim 23, the computer system further comprising:

skipping rendering of the geometry at a rendering GPU when the information indicates that the geometry does not overlap any screen area allocated to the rendering GPU for object rendering, wherein the rendering GPU is one of the plurality of GPUs.