GB2465812A

GB2465812A - Distributed processing for rendering 3D images

Info

Publication number: GB2465812A
Application number: GB0821938A
Authority: GB
Inventors: John Howson
Original assignee: Imagination Technologies Ltd
Current assignee: Imagination Technologies Ltd
Priority date: 2008-12-01
Filing date: 2008-12-01
Publication date: 2010-06-02
Also published as: GB0821938D0

Abstract

A method and apparatus are provided for rendering a three dimensional computer graphics image. The image is sub-divided into a plurality of rectangular areas each associated with a rectangular portion of a display. Graphics image data relating to objects to be rendered is provided and assigned to respective ones of object lists associated with each respective rectangular area. The object lists for each rectangular area are passed to a distribution means coupled to a plurality of graphics processing units. The distribution means determine which graphics processing units are able to receive data for processing and passes the object lists to individual ones of the processing units in dependence on the result of the determination.

Description

Multi-Core Rasterisation in a Tile Based Rendering System This invention relates to a three-dimensional computer graphics rendering system and in particular to methods and apparatus which may be used for combining multiple independent graphics processing cores for the purpose of increasing rasterisation performance.

Background to the Invention

It is desirable to offer computer graphics processing cores at many different performance points, however the complexity of modern computer graphics makes it difficult to do this in either a timely or cost effective manner. As such it is desirable to have a method of combining multiple independent processing cores such that performance may be increased without developing a whole new core.

Tile based rendering systems are well-known. These subdivide an image into a plurality of rectangular blocks or tiles. Figure 1 illustrates an example of a tile based rendering system. A primitive/command fetch unit 101 retrieves command and (graphics) primitive data from memory and passes this to a geometry processing unit 102. This transforms the primitive and command data into screen space using well-known methods. This data is then supplied to a tiling unit 103 which inserts object data from the screen space geometry into object lists for each of a set of defined rectangular regions or tiles in which screen space is divided. An object list for each tile contains primitives that exist wholly or partially in that tile. An object list exists for every tile on the screen, although some object lists may have no data in them. These object lists are fetched by a tile parameter fetch unit 105 which supplies them tile by tile to a hidden surface removal unit (HSR) 106 which removes the primitives of surfaces which will not contribute to the final scene (usually because they are obscured by another surface). The HSR unit processes each primitive in the tile to determine which on visible of pixels and passes only data for visible pixels to a testing and shading unit (TSU) 108.

The TSU takes the data from the HSR and uses it to fetch textures and apply shading to each pixel within a visible object using well-known techniques. The TSU then supplies the textured and shaded data to an alpha test/fogging/alpha blending unit 110. This is able to apply degrees of transparency/opacity to the surfaces again using well-known techniques. Alpha blending is performed using an on chip tile buffer 112 thereby eliminating the requirement to access external memory for this operation. It should be noted that the TSU and alpha test/fogging/alpha blend units may be fully programmable in nature.

Once each tile has been completed, the pixel processing unit 114 performs any necessary backend processing such as packing and anti-alias filtering before writing the resulting data to a rendered scene buffer 116, ready for displayBritish Patent No. GB2343598 (the contents of which are incorporated herein by reference) describes scaling rasterisation performance within a tile based rendering environment by distributing workload across cores by rasterising alternate tiles on alternate cores, for example in a chequer board pattern.

Although this approach minimises the effects of uneven distribution of work load across the tiles that make up the scene it doesn't allow for all circumstances. For example consider the image in Figure 2, triangles 1 and 2 (200, 210) in tile 0 require a total of 600 clocks of processing, the triangles overlapping each of the remaining tiles T3, T4 and T5 (220, 230 and 240) each require 200 clocks of pixel processing. Given two cores processing the tiles in a checker board arrangement as shown, core 1 will execute a total of 800 clocks of pixel processing and core 2 will execute a total of 400 clocks of pixel processing, as a result core 2 will remain idle for 400 clocks. This is a significant imbalance in processing load between the two processing cores.

Summary of the Invention

Preferred embodiments of the present invention provide a method and apparatus that allow a tile based rendering system to scale rasterisation performance in a linear fashion and that minimises the loading differences across a plurality of cores. This is accomplished by the addition of separate region fetch and distribution units that distribute regions to be processed across multiple cores based on work load within each core.

Brief Description of the Drawings

Preferred embodiments of the invention will now be described in detail by way of with reference to the accompanying drawings in which: Figure 1 illustrates an example of a prior art tile based rendering system as described above; Figure 2 illustrates the load balancing problem that occurs with a fixed assignment of cores to regions as described above; and Figure 3 illustrates a system embodying the invention.

Detailed Description of Preferred Embodiments

The output of the tiling process in a graphics rasterisation system is a set of object lists for non overlapping regions each of which contains references to all geometry that overlaps its respective region. As there is no spatial overlap between each region it is possible to rasterise the regions in any order. Given this it is possible to distribute regions across processing cores for rasterisation in an order that is dictated by loading on the individual processing cores instead of some predetermined spatial arrangement as described in our previous United Kingdom Patent No. GB2343598.

Figure 3 illustrates a proposed arrangement embodying the invention. A region fetch unit 300 reads region header data (including object list data) from memory and passes them to a region distribution unit 310 which passes the region data to a processing core within an array of available cores 340 that is least busy for processing. Each core within the array of cores receives a signal or set of signals 330 from the region distribution unit 310 including data about the region to be processed by the processing core. Each processing core produces a return signal 320 that indicates whether or not the core can take a new region for processing, at that time.

In the scenario shown in figure 2 the system, when equipped with two processing cores will operate as follows. The region fetch unit 310 fetches region data for tile0 and passes it to the region distribution unit 310. As all processing cores will initially be idle the region distribution unit will distribute data for the regionto the available cores in a round robin fashion, starting with core 0 for tile 0 and core 1 for tile and so on. The region distribution unit will then need to wait for one of the processing cores to be able to accept another region before proceeding to distribute tile 2. As core 2 will only have 200 clocks of processing for tile I it will become free first so the region distribution unit will pass tile 2 to it for process, and again for tile 3 when tile 2 is complete.

Therefore, initial distribution takes place sequentially, but as different processing cores take different times to perform rasterisation, the system monitors availability of cores and distributes data in dependence on this.

It should be noted that completion of rasterisation of the whole scene will be signalled by the region distribution unit when there are no more regions to be processed and all processing cores signal that they are idle.

In a modification to the invention it is proposed that the region fetch unit and the region distribution unit allow subsequent renders for the next field frame to be executed in the case where one or more of the processing cores has become idle and there are no more regions to be rasterised in the current scene.

Claims

Claims 1. A method for rendering a three dimensional computer graphics image comprising the steps of subdividing an image into a plurality of rectangular areas, each associated with a rectangular portion of a display; providing graphics image data relating to objects in the image to be rendered; for each rectangular area, assigning data associated with each object to an object lists associated with the rectangular area; passing the object lists for each rectangular area to a distribution means coupled to a plurality of graphics processing units; determining at the distribution means which graphics processing units are available to receive data for processing; and passing object lists to individual ones of the graphic processing units in dependence on the result of the determination.
2. A method according to claim 1 including the step of providing a return signal from each graphics processing unit indicating its availability to accept data, for use in the determining step.
3. A method according to any preceding claim including the step of commencing rendering of a subsequent field or frame which one or more graphics processing units become idle.
4. A method according to any preceding claim in which the distribution means determining whether or not graphics processing units are idle.
5. A system for rendering a three dimensional computer graphics image comprising; means for sub-dividing an image into a plurality of rectangular areas, each associated with a rectangular portion of a display; means for providing graphics image data relating to objects in the image to be rendered; means for assigning data associated with each object for each rectangular area to an object list associated with the rectangular area; means for passing the object lists for each rectangular area to a distribution means coupled to a plurality of graphics processing units; means for determining at the distribution means which graphics processing units are able to receive data from processing; and, means for passing object lists to individual ones of the graphics processing units in dependence on the result of the determination.
6. A system according to claim 5 including means for providing a return signal from each graphics processing unit indicating its availability to accept data for use in the determining step.
7. A system according to claim 5 or 6 including means for commencing rendering of a subsequent frame when one or more graphic processing units becomes idle.
8. A system according to any of claims 5 to 7 in which the distribution means determines whether or not a graphics processing unit is idle.