[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116894906A - Graphics rendering method and processor hardware architecture - Google Patents

Graphics rendering method and processor hardware architecture Download PDF

Info

Publication number
CN116894906A
CN116894906A CN202311162624.9A CN202311162624A CN116894906A CN 116894906 A CN116894906 A CN 116894906A CN 202311162624 A CN202311162624 A CN 202311162624A CN 116894906 A CN116894906 A CN 116894906A
Authority
CN
China
Prior art keywords
primitives
rendering mode
rendering
preset number
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202311162624.9A
Other languages
Chinese (zh)
Inventor
姜涛
刘天江
宋子豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hanbo Semiconductor Shanghai Co ltd
Original Assignee
Hanbo Semiconductor Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hanbo Semiconductor Shanghai Co ltd filed Critical Hanbo Semiconductor Shanghai Co ltd
Priority to CN202311162624.9A priority Critical patent/CN116894906A/en
Publication of CN116894906A publication Critical patent/CN116894906A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

The application provides a graphic rendering method and a processor hardware architecture, wherein the graphic rendering method comprises the following steps: s1, acquiring a preset number of primitives from a cache; s2, selecting an immediate rendering mode or a delayed rendering mode to conduct batch rendering on a preset number of primitives according to the rendering mode indication information; s3, circularly executing the steps S1 and S2 until the rendering of the pre-stored primitives in the cache is completed. According to the graphic rendering method and the processor architecture, the rendering mode indication information can be generated according to the coverage condition of the primitives of the current batch, and the graphic rendering mode can be selected in a self-adaptive mode, so that the graphic rendering efficiency is improved effectively.

Description

Graphics rendering method and processor hardware architecture
Technical Field
The present application relates to the field of computer image processing, and in particular, to a graphics rendering method and a processor hardware architecture.
Background
Graphics Processors (GPUs) are microprocessors that perform image and graphics-related operations on personal computers, workstations, gaming machines, and some mobile devices (e.g., tablet computers, smartphones, etc.). GPU rendering refers to the process of converting a three-dimensional model or scene into a two-dimensional image using a GPU. In the GPU rendering task, the rendering task in a frame of image will be overlapped with each other, and the covered pixels will not be displayed on the screen finally, if the GPU performs pixel dyeing on all the pixels, rendering resources will be wasted, and rendering efficiency will be reduced.
At present, the GPU mainly has two rendering modes: immediate rendering and deferred rendering. The immediate rendering means rendering one by taking the primitives as a unit, and the rendering mode has high efficiency under the condition of low primitive coverage rate, but the repeated reading and writing of the memory brings huge bandwidth consumption, and has low efficiency under the condition of high coverage rate. The delayed rendering is to collect all the primitives to be drawn of one frame of image based on a screen sub-block (tile), then combine the visible pixels and then render.
Accordingly, it is desirable to provide a method capable of effectively improving graphics rendering efficiency.
Disclosure of Invention
In view of the above, the present application provides a graphics rendering method and a processor hardware architecture for solving the above-mentioned technical problems in the prior art.
According to an aspect of the present application, there is provided a graphic rendering method, the method including:
s1, acquiring a preset number of primitives from a cache;
s2, selecting an immediate rendering mode or a delayed rendering mode to conduct batch rendering on the preset number of primitives according to the rendering mode indication information;
s3, circularly executing the steps S1 and S2 until the rendering of the pre-stored primitives in the cache is completed,
wherein the immediate rendering mode includes:
rasterizing and deeply testing each primitive in the preset number of primitives;
setting the rendering mode indication information according to the coverage condition of the preset number of primitives,
wherein the delayed rendering mode includes:
dividing the screen subblocks of the preset number of the primitives to obtain a screen subblock list;
according to the mask information in the screen sub-block list, carrying out rasterization processing on the primitives in the screen sub-blocks and carrying out depth test processing on uncovered pixel points in the primitives in the screen sub-blocks;
and setting the rendering mode indication information according to the coverage condition of the preset number of primitives.
According to some embodiments of the present application, the above method further includes, before the first execution of step S2, setting the rendering mode indication information to any one of the immediate rendering mode and the delayed rendering mode.
According to some embodiments of the application, the delayed rendering mode further comprises counting the number of uncovered pixels by a counter during the depth test.
According to some embodiments of the application, the step of setting the rendering mode indication information according to coverage of a predetermined number of primitives includes:
if the coverage rate of the preset number of primitives is smaller than a preset threshold value, setting the rendering mode indication information into an immediate rendering mode;
and if the coverage rate of the preset number of primitives is greater than or equal to a preset threshold value, setting the rendering mode indication information into a delayed rendering mode.
According to some embodiments of the application, the primitives are triangles obtained by a geometry processing stage and the cache is a cache on a GPU chip.
According to some embodiments of the application, the method further comprises, before step S1:
storing the primitives obtained in the geometric processing stage into a primitive cache;
obtaining a preset number of primitives from a primitive cache, and storing the preset number of primitives into the cache.
According to some embodiments of the application, the immediate rendering mode and the deferred rendering mode further comprise coloring pixels that remain after rasterization and depth test processing.
According to an aspect of the present application, there is provided a processor hardware architecture for graphics rendering, the hardware architecture comprising a cache, a rendering mode control unit, a screen sub-block processing unit, and a rasterization and depth test unit,
the cache is configured to store a preset number of primitives;
the screen sub-block processing unit is configured to acquire a preset number of primitives from the cache, and divide the preset number of primitives into screen sub-blocks to acquire a screen sub-block list;
the rasterization and depth test unit is configured to select an immediate rendering mode or a delayed rendering mode to conduct batch rendering on the preset number of primitives according to the rendering mode indication information;
wherein, in the immediate rendering mode, the rasterization and depth test unit is configured to:
rasterizing and deeply testing each primitive in a preset number of primitives;
the coverage information of a predetermined number of primitives is provided to the rendering mode control unit,
wherein in the delayed rendering mode, the rasterization and depth test unit is configured to:
according to the mask information in the screen sub-block list, carrying out rasterization processing on the primitives in the screen sub-blocks and carrying out depth test processing on uncovered pixel points in the primitives in the screen sub-blocks;
providing the coverage information of the predetermined number of primitives to a rendering mode control unit,
and wherein the rendering mode control unit is configured to provide rendering mode indication information according to the coverage information of the predetermined number of primitives.
According to an aspect of the present application, there is provided an electronic apparatus including:
one or more processors;
a memory for storing executable instructions;
the one or more processors are configured to implement the methods described above via the executable instructions.
According to an aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the method described above.
According to the graphic rendering method and the graphic processor, the rendering mode indication information can be generated according to the coverage condition of the primitives of the current batch, and the graphic rendering method and the graphic processor are used for selecting one of the immediate rendering mode and the delayed rendering mode to render the primitives of the next batch, so that the rendering mode of the primitives is adaptively selected. Therefore, the graphics rendering method and the graphics processor have the advantages of both immediate rendering and delayed rendering, and the graphics rendering efficiency can be effectively improved.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and do not limit the application.
FIG. 1 illustrates a flow chart of a graphics rendering method provided by an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram showing the hardware architecture of a graphics processor according to an exemplary embodiment of the present application;
fig. 3 shows a block diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Various exemplary embodiments of the present application will be described in detail below with reference to the accompanying drawings. The description of the exemplary embodiments is merely illustrative, and is not intended to be any limitation on the application, its application or use. The present application may be embodied in many different forms and is not limited to the embodiments described herein. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. As used in this specification, the term "plurality/s/these" means two or more, and the term "based on/according to" should be interpreted as "based at least in part on/according to". Furthermore, the term "and/or" and "at least one of …" encompasses any and all possible combinations of the listed items.
Referring to fig. 1, a flowchart of a graphics rendering method provided by an exemplary embodiment of the present application is shown. The method according to an embodiment of the application may be performed by a general-purpose Central Processing Unit (CPU) and preferably by a Graphics Processor (GPU). Since the graphics processor has a dedicated hardware architecture (such as that shown in fig. 2), a higher execution speed can be obtained.
As shown in fig. 1, the graphic rendering method provided by the exemplary embodiment of the present application includes:
s1, acquiring a preset number of primitives from a cache;
s2, selecting an immediate rendering mode or a delayed rendering mode to conduct batch rendering on the preset number of primitives according to the rendering mode indication information;
s3, circularly executing the steps S1 and S2 until the rendering of the pre-stored primitives in the cache is completed.
Before step S1, after the geometry processing stage is completed, the assembled primitives are stored in the primitive buffer, then a preset number of primitives are obtained from the primitive buffer, and the preset number of primitives are stored in the cache. The number of primitives stored in the cache may be preset as needed, and may be set to 100k, for example.
In step S1, a predetermined number of primitives is fetched from the cache. The Cache may be, for example, a three-level Cache (L3 Cache) on a Central Processing Unit (CPU) or a Graphics Processor (GPU), thereby enabling the method of embodiments of the present application to quickly obtain a predetermined number of primitives. The method according to an embodiment of the application renders primitives in batches, wherein a batch comprises a predetermined number of primitives (e.g., 128 primitives). The primitive refers to a basic graphic element, preferably a triangle in the embodiment of the present application.
In step S2, according to the rendering mode indication information, an immediate rendering mode or a delayed rendering mode is selected to perform batch rendering on the predetermined number of primitives. Before the first execution of step S2, the rendering mode indication information may be set to any one of an immediate rendering mode and a delayed rendering mode for rendering the first primitive acquired from the cache. In some embodiments, the coverage rate of the first primitive may be counted first, and if the coverage rate of the first primitive is greater than or equal to a preset threshold, the rendering mode indication information is set to be in a delayed rendering mode; if the coverage rate of the first-batch primitives is smaller than a preset threshold value, setting the rendering mode indication information to be an immediate rendering mode. For a subsequent batch of primitives retrieved from the cache, one of the immediate rendering mode and the deferred rendering mode may be selected to render the subsequent batch of primitives based on rendering mode indication information obtained in the rendering of the previous batch of primitives.
In step S3, steps S1 and S2 are cyclically performed until the rendering of the primitive stored in advance in the cache is completed. In the process of each round of circulation, no matter the immediate rendering mode or the delayed rendering mode is adopted to render the primitive, the rendering mode indication information is generated according to the coverage condition of the primitive in the batch and is used for indicating the rendering mode adopted in the next round of circulation. Therefore, according to the method provided by the embodiment of the application, the rendering mode of the primitives in the next batch can be adaptively adjusted according to the primitive coverage condition of the current batch, so that the rendering efficiency of the primitives is improved. The number of rounds of performing steps S1 and S2 in a loop depends on the number of primitives stored in advance in the cache and the number of primitives per batch to be rendered (the predetermined number of primitives per batch retrieved from the cache), with each round of loops completing the rendering of the primitives for one batch number until the rendering of all the primitives stored in advance in the cache is completed.
In a method according to an embodiment of the application, the immediate rendering mode comprises performing the steps of: rasterizing and deeply testing each primitive in a preset number of primitives; and setting rendering mode indication information according to the coverage condition of the preset number of primitives.
In the immediate rendering mode, rasterization and depth test processing are performed one by one in units of primitives. In the process of rasterization and depth test, the coverage condition of the graphic primitive needs to be counted. The coverage of the primitive may be represented by, for example, primitive coverage. In the process of performing the depth test, the covered pixels are removed one by one, so that the number of the removed pixels (i.e. covered pixels) or the number of the non-removed pixels (i.e. non-covered pixels) can be counted by a counter, and the coverage rate of the batch of primitives can be estimated by calculating the proportion of the removed pixels to the total number of the pixels. In addition, according to the relation between the removed pixel points and the primitives, counting the number of the primitives which are not covered or the number of the primitives which are covered by using a counter, and dividing the number of the primitives which are covered by the total number of the primitives in one batch, so that the coverage rate of the primitives in the batch can be accurately obtained. When the primitive coverage rate is low, the rendering efficiency of the immediate rendering mode is high, and when the primitive coverage rate is high, the rendering efficiency of the immediate rendering mode is low. According to the method, the rendering mode indication information is set according to the coverage rate of the primitives in the current batch and is used for indicating the rendering mode of the primitives in the next batch, so that the primitives in the next batch can be rendered in an appropriate rendering mode.
In a method according to an embodiment of the application, the delayed rendering mode comprises performing the steps of: dividing the sub-blocks of the screen for a predetermined number of primitives to obtain a list of sub-blocks of the screen; according to mask information in the screen sub-block list, carrying out rasterization processing on the primitives in the screen sub-blocks and carrying out depth test processing on uncovered pixel points in the primitives in the screen sub-blocks; and setting rendering mode indication information according to the number of the uncovered primitives in the preset number of primitives.
In the deferred rendering mode, 128 primitives are first fetched from the cache and their parameters are stored in specified locations in the cache. And then taking the 128 primitives as a group, generating a screen sub-block list, and storing the screen sub-block list into a screen sub-block list cache. And finally, starting to acquire parameters of the uncovered primitives according to mask information in the screen sub-block list, and carrying out rasterization and depth test processing on the uncovered primitives. The mask information is used to identify which screen sub-blocks have primitives or partial primitives and which do not. The rasterizing unit may process the primitives included in each screen sub-block according to the mask information. After the rasterization processing is performed on the primitives, the depth test unit can perform depth test on uncovered pixels in the sub-blocks according to the depth value data of the pixels in the sub-blocks, and the depth test is not performed on all the pixels, so that the graphics rendering efficiency is improved.
In the process of rasterization and depth test, the coverage condition of the graphic primitive needs to be counted. The coverage of the primitive may be represented by, for example, primitive coverage. In the depth test processing, the uncovered pixel points are processed and the uncovered pixel points are not processed (namely, the depth test processing can be performed on the uncovered pixel points only), so that the coverage condition of the pixel can be estimated by counting the number of the covered pixel points or the uncovered pixel points through a counter and calculating the proportion of the number of the covered pixel points to the number of the total pixel points. In some embodiments, the coverage rate of the pixels in a batch can be accurately obtained by counting the number of uncovered pixels or the number of covered pixels by using a counter according to the relation between the covered pixels and the pixels and dividing the number of covered pixels by the total number of the number of pixels in the batch. When the primitive coverage rate is low, the rendering efficiency of the deferred rendering mode is low, and when the primitive coverage rate is high, the rendering efficiency of the deferred rendering mode is high. According to the method, the rendering mode indication information is set according to the coverage rate of the primitives in the current batch and is used for indicating the rendering mode of the primitives in the next batch, so that the primitives in the next batch can be rendered in an appropriate rendering mode.
In the immediate rendering mode and the deferred rendering mode, subsequent processing, such as shading, is also included on the rasterized and depth tested primitives after the rasterization and depth testing processes. The object of coloring treatment is pixel point remained after rasterization and depth test treatment.
In some embodiments, the step of setting the rendering mode indication information according to coverage of a predetermined number of primitives (i.e. primitives of a batch) specifically includes:
if the coverage rate of the preset number of primitives is smaller than a preset threshold value, setting the rendering mode indication information into an immediate rendering mode;
and if the coverage rate of the preset number of primitives is greater than or equal to a preset threshold value, setting the rendering mode indication information into a delayed rendering mode.
When the coverage rate of the primitives of a certain batch is lower, as the number of the primitives covered is smaller and the proportion of the primitives needing to be rendered is larger, the adoption of the immediate rendering mode can effectively avoid the consumption of computing resources and bandwidth in the processing of the sub-blocks of the screen in the delayed rendering mode, thereby improving the rendering efficiency of the primitives. When the coverage rate of the primitives of one batch is higher, the delay rendering mode can effectively avoid the resources and bandwidth consumed by unnecessary rendering of the covered primitives (or pixel points) in the immediate rendering mode, thereby improving the rendering efficiency of the primitives.
Referring to FIG. 2, a processor hardware architecture for graphics rendering is shown in accordance with an exemplary embodiment of the present application. The processing hardware architecture is typically used for a Graphics Processor (GPU).
As shown in fig. 2, the hardware architecture mainly includes: including a cache, a rendering mode control unit, a screen sub-block processing unit, and a rasterization and depth test unit. The cache is configured to store a preset number of primitives. The screen sub-block processing unit is configured to acquire a predetermined number of primitives from the cache, and to divide the predetermined number of primitives into screen sub-blocks to obtain a screen sub-block list. The rasterization and depth test unit is configured to select an immediate rendering mode or a delayed rendering mode to conduct batch rendering on a preset number of primitives according to the rendering mode indication information
According to a processor hardware architecture of an embodiment of the application, in an immediate rendering mode, the rasterization and depth test unit is configured to: rasterizing and deeply testing each primitive in a preset number of primitives; and providing coverage information of the predetermined number of primitives to a rendering mode control unit.
According to a processor hardware architecture of an embodiment of the present application, in a deferred rendering mode, a rasterization and depth test unit is configured to: according to mask information in the screen sub-block list, carrying out rasterization processing on uncovered pixels in a preset number of pixels and carrying out depth test processing on uncovered pixels in the screen sub-block; the coverage information of a predetermined number of primitives is provided to the rendering mode control unit.
According to the processor hardware architecture of the embodiment of the application, the rendering mode control unit is configured to provide rendering mode indication information according to coverage information of a predetermined number of primitives.
The processor hardware architecture according to an embodiment of the present application shown in fig. 2 is capable of performing the graphics rendering method according to an embodiment of the present application. The processor hardware architecture shown in fig. 2 will be described in detail below.
The geometry shading unit and the primitive assembling unit of the processor hardware architecture according to the present embodiment are used for performing geometry processing. After the geometry processing is completed, the assembled primitives (e.g., triangles) are stored in the primitive cache and the rendering mode control unit is notified. The rendering mode control unit stores primitive information in a cache and records the number of triangles. When the number of triangles reaches a preset upper limit (e.g. 100 k), the receiving of triangles from the geometry processing stage is stopped and the screen sub-block processing unit is started. The screen sub-block processing unit decides whether to execute the screen sub-block dividing operation according to the rendering mode indication information given by the mode control unit.
If the rendering mode indication information indicates that rendering is performed in the delayed rendering mode, a screen sub-block division operation needs to be performed. In this case, the screen sub-block processing unit divides the triangle by the number of configurable screen sub-blocks. First, the screen sub-block processing unit fetches a predetermined number (for example, 128) of triangles from the cache and stores its parameters in specified locations of the triangle cache. Then the screen sub-block processing unit takes the preset number of triangles as a group to generate a screen sub-block list and stores the screen sub-block list into a screen sub-block list cache. And finally, starting a rasterization and depth test unit, carrying out rasterization processing on the uncovered pixels in the sub-blocks of the screen according to the mask information in the sub-blocks of the screen, and carrying out depth test processing on the uncovered pixels in the sub-blocks of the screen.
If the rendering mode indication information indicates that rendering is performed in the immediate rendering mode, it is not necessary to perform a screen sub-block division operation. In this case, the screen sub-block processing unit directly stores triangle attribute information into the triangle buffer, and the rasterization and depth test unit directly takes out the triangle information to perform rasterization and depth test processing. After the rasterization and depth test unit processes a predetermined number of triangles, the screen sub-block process fetches the predetermined number of triangles from the cache to update the triangle cache, and repeats the above operation until all the triangles (e.g., 100 k) in the cache are processed.
The rasterization and depth test unit may render primitives according to two rendering modes, immediate rendering and deferred rendering. When the delay rendering mode is selected, the rasterization and depth testing unit performs rasterization and depth testing based on the screen sub-blocks, and simultaneously counts the coverage condition of the triangles of the current batch in the form of a counter and feeds the coverage condition back to the rendering mode control unit. The rendering mode control unit compares the coverage of triangles with a configurable threshold. If the coverage rate of the primitives of a certain batch is greater than or equal to a preset threshold value, the triangle coverage condition in the batch is severe, and the next batch adopts a delay rendering mode; if the coverage of the triangle is smaller than the preset threshold, the coverage of the triangle in the batch is slight, and the primitive of the next batch is in the immediate rendering mode. Likewise, when the immediate rendering mode is selected, the rasterization and depth test unit also counts and feeds back the coverage of the triangle of the current batch. Finally, a rendering mode of the triangle of the next batch is determined by a rendering mode selection unit. After the rasterization and depth test processing is performed, the reserved pixel points are sent to a pixel coloring unit for coloring.
It should be appreciated that the processor hardware architecture shown in fig. 2 is capable of performing the method described previously in this specification. Thus, the operations, features and advantages described above with respect to the method apply equally to the processor hardware architecture and the unit modules comprised thereby; the operations, features and advantages described above with respect to the processor hardware architecture and the unit modules comprised thereof are equally applicable to the method. For brevity, substantially identical/similar operations, features and advantages are not described in detail herein.
Although specific functions are discussed above with reference to specific modules, it should be noted that the functions of each unit module in the technical solution of the present application may be implemented by dividing the functions into a plurality of unit modules, and/or at least some functions of the plurality of unit modules may be implemented by combining the functions into a single unit module. The manner in which a particular unit module performs an action in the present application includes that the particular unit module itself performs the action, or that the particular unit module invokes or otherwise accesses the performed action (or performs the action in conjunction with the particular unit module). Thus, a particular unit module that performs an action may include that particular unit module itself that performs the action and/or another unit module that the particular unit module invokes or otherwise accesses that performs the action.
In addition to the technical scheme, the application further provides electronic equipment, which comprises one or more processors and a memory for storing executable instructions. Wherein the one or more processors are configured to implement the above-described methods via executable instructions.
The application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the above method.
In the following part of the present description, illustrative examples of the aforementioned electronic device, non-transitory computer readable storage medium, and computer program product will be described in connection with fig. 3.
Fig. 3 shows a block diagram of an electronic device according to an exemplary embodiment of the present application. The system provided by the present application may also be implemented, in whole or in part, by electronic device 900 or a similar device or system.
The electronic device 900 may be a variety of different types of devices. Examples of electronic device 900 include, but are not limited to: desktop, server, notebook or netbook computers, mobile devices, wearable devices, entertainment devices, televisions or other display devices, automotive computers, and the like.
The electronic device 900 may include at least one processor 902, memory 904, communication interface(s) 909, display device 901, other input/output (I/O) devices 910, and one or more mass storage devices 903, which can communicate with each other, such as through a system bus 911 or other suitable connection.
The processor 902 may be a single processing unit or multiple processing units, all of which may include a single or multiple computing units or multiple cores. The processor 902 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The processor 902 may be configured to, among other capabilities, obtain and execute computer-readable instructions stored in the memory 904, mass storage device 903, or other computer-readable medium, such as program code of the operating system 905, program code of the application programs 906, program code of other programs 907, and so forth.
Memory 904 and mass storage device 903 are examples of computer-readable storage media for storing instructions that are executed by processor 902 to implement the various functions as previously described. For example, the memory 904 may generally include volatile memory and non-volatile memory. In addition, mass storage devices 903 may generally include hard drives, solid state drives, removable media, and the like. The memory 904 and the mass storage device 903 may both be referred to collectively as memory or computer-readable storage media in the present application, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed by the processor 902 as a particular machine configured to implement the operations and functions described in the examples of this application.
A number of programs may be stored on mass storage device 903. These programs include an operating system 905, one or more application programs 906, other programs 907, and program data 908, and may be loaded into the memory 904 for execution. Examples of such application programs or program modules may include, for example, computer program logic (e.g., computer program code or instructions) for implementing the following components/functions: the methods provided by the present application (including any suitable steps of the methods) and/or additional embodiments described herein.
Although illustrated in fig. 3 as being stored in memory 904 of electronic device 900, operating system 905, one or more application programs 906, other programs 907, and program data 908, or portions thereof, may be implemented using any form of computer readable media accessible by electronic device 900. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.
Communication media includes, for example, computer readable instructions, data structures, program modules, or other data in a communication signal that is transferred from one system to another system. The communication medium may include a conductive transmission medium, as well as a wireless medium capable of propagating energy waves. Computer readable instructions, data structures, program modules, or other data may be embodied as a modulated data signal, for example, in a wireless medium. The modulation may be analog, digital or hybrid modulation techniques.
By way of example, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory, such as random access memory; and nonvolatile memory such as flash memory, various read only memories, magnetic and ferromagnetic/ferroelectric memory; magnetic and optical storage devices; or other known media or later developed computer-readable information/data that may be stored for use by a computer system.
One or more communication interfaces 909 are used to exchange data with other devices, such as via a network, direct connection, or the like. Such communication interfaces may be one or more of the following: any type of network interface, wired or wireless interface, wi-MAX interface, ethernet interface, universal serial bus interface, cellular network interface, bluetooth interface, NFC interface, etc. Communication interface 909 may facilitate communication within a variety of networks and protocol types, including wired and wireless networks, the internet, and the like. Communication interface 909 may also provide for communication with external storage devices (not shown) such as in a storage array, network attached storage, storage area network, or the like.
In some examples, a display device 901, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 910 may be devices that receive various inputs from a user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/output devices, and so on. The technical solutions described herein may be supported by these various configurations of the electronic device 900 and are not limited to the specific examples of the technical solutions described herein.
While the application has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative and schematic and not restrictive; it will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The scope of the application is, therefore, indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware.

Claims (10)

1. A method of graphics rendering, the method comprising:
s1, acquiring a preset number of primitives from a cache;
s2, selecting an immediate rendering mode or a delayed rendering mode to conduct batch rendering on the preset number of primitives according to the rendering mode indication information;
s3, circularly executing the steps S1 and S2 until the rendering of the pre-stored primitives in the cache is completed,
wherein the immediate rendering mode includes:
rasterizing and deeply testing each primitive in the preset number of primitives;
setting the rendering mode indication information according to the coverage condition of the preset number of primitives,
wherein the delayed rendering mode includes:
dividing the screen subblocks of the preset number of the primitives to obtain a screen subblock list;
according to the mask information in the screen sub-block list, carrying out rasterization processing on the primitives in the screen sub-blocks and carrying out depth test processing on uncovered pixel points in the primitives in the screen sub-blocks;
and setting the rendering mode indication information according to the coverage condition of the preset number of primitives.
2. The method according to claim 1, further comprising, before performing step S2 for the first time, setting the rendering mode indication information to any one of an immediate rendering mode and a delayed rendering mode.
3. The method of claim 1, wherein the deferred rendering mode further comprises counting, by a counter, a number of uncovered pixels during the depth test.
4. The method according to claim 1, wherein the step of setting the rendering mode indication information according to the coverage of the predetermined number of primitives includes:
if the coverage rate of the preset number of primitives is smaller than a preset threshold value, setting the rendering mode indication information to be an immediate rendering mode;
and if the coverage rate of the preset number of primitives is greater than or equal to a preset threshold value, setting the rendering mode indication information to be a delayed rendering mode.
5. The method of claim 1, wherein the primitives are triangles obtained from a geometric processing stage and the cache is a cache on a GPU chip.
6. The method according to claim 4, characterized in that the method further comprises, before step S1:
storing the primitives obtained in the geometric processing stage into a primitive cache;
obtaining a preset number of primitives from a primitive cache, and storing the preset number of primitives into the cache.
7. The method of claim 1, wherein the immediate rendering mode and the deferred rendering mode further comprise coloring pixels that remain after rasterization and depth test processing.
8. A processor hardware architecture for graphics rendering, characterized in that the hardware architecture comprises a cache, a rendering mode control unit, a screen sub-block processing unit, and a rasterization and depth test unit,
the cache is configured to store a preset number of primitives;
the screen sub-block processing unit is configured to acquire a preset number of primitives from the cache, and divide the preset number of primitives into screen sub-blocks to acquire a screen sub-block list;
the rasterization and depth test unit is configured to select an immediate rendering mode or a delayed rendering mode to conduct batch rendering on the preset number of primitives according to the rendering mode indication information;
wherein, in the immediate rendering mode, the rasterization and depth test unit is configured to:
rasterizing and deeply testing each primitive in the preset number of primitives;
providing the coverage information of the predetermined number of primitives to the rendering mode control unit,
wherein, in the delayed rendering mode, the rasterization and depth test unit is configured to:
according to the mask information in the screen sub-block list, carrying out rasterization processing on the primitives in the screen sub-blocks and carrying out depth test processing on uncovered pixel points in the primitives in the screen sub-blocks;
providing the coverage information of the predetermined number of primitives to the rendering mode control unit,
and wherein the rendering mode control unit is configured to provide rendering mode indication information according to coverage information of the predetermined number of primitives.
9. An electronic device, the electronic device comprising:
one or more processors;
a memory for storing executable instructions;
the one or more processors are configured to implement the method of any of claims 1 to 7 via the executable instructions.
10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to perform the method of any of claims 1 to 7.
CN202311162624.9A 2023-09-11 2023-09-11 Graphics rendering method and processor hardware architecture Withdrawn CN116894906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311162624.9A CN116894906A (en) 2023-09-11 2023-09-11 Graphics rendering method and processor hardware architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311162624.9A CN116894906A (en) 2023-09-11 2023-09-11 Graphics rendering method and processor hardware architecture

Publications (1)

Publication Number Publication Date
CN116894906A true CN116894906A (en) 2023-10-17

Family

ID=88315279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311162624.9A Withdrawn CN116894906A (en) 2023-09-11 2023-09-11 Graphics rendering method and processor hardware architecture

Country Status (1)

Country Link
CN (1) CN116894906A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120293519A1 (en) * 2011-05-16 2012-11-22 Qualcomm Incorporated Rendering mode selection in graphics processing units
US20130135329A1 (en) * 2011-11-30 2013-05-30 Qualcomm Incorporated Switching between direct rendering and binning in graphics processing
CN103871020A (en) * 2012-12-17 2014-06-18 Arm有限公司 Hidden surface removal in graphics processsing systems
CN106233326A (en) * 2014-04-21 2016-12-14 高通股份有限公司 Based on manifesting manifesting flexibly of target in graphics process
US20180189925A1 (en) * 2017-01-04 2018-07-05 Samsung Electronics Co., Ltd. Graphics processing method and system
CN108711133A (en) * 2017-04-01 2018-10-26 英特尔公司 The Immediate Mode based on segment of Z with early stage layering renders
WO2020015808A1 (en) * 2018-07-16 2020-01-23 Huawei Technologies Co., Ltd. Primitive z-sorting
CN115315727A (en) * 2020-03-10 2022-11-08 超威半导体公司 Graphics processing unit rendering mode selection system
CN116670719A (en) * 2020-12-27 2023-08-29 华为技术有限公司 Graphic processing method and device and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120293519A1 (en) * 2011-05-16 2012-11-22 Qualcomm Incorporated Rendering mode selection in graphics processing units
US20130135329A1 (en) * 2011-11-30 2013-05-30 Qualcomm Incorporated Switching between direct rendering and binning in graphics processing
CN103959337A (en) * 2011-11-30 2014-07-30 高通股份有限公司 Switching between direct rendering and binning in graphics processing
CN103871020A (en) * 2012-12-17 2014-06-18 Arm有限公司 Hidden surface removal in graphics processsing systems
CN106233326A (en) * 2014-04-21 2016-12-14 高通股份有限公司 Based on manifesting manifesting flexibly of target in graphics process
US20180189925A1 (en) * 2017-01-04 2018-07-05 Samsung Electronics Co., Ltd. Graphics processing method and system
CN108711133A (en) * 2017-04-01 2018-10-26 英特尔公司 The Immediate Mode based on segment of Z with early stage layering renders
WO2020015808A1 (en) * 2018-07-16 2020-01-23 Huawei Technologies Co., Ltd. Primitive z-sorting
CN115315727A (en) * 2020-03-10 2022-11-08 超威半导体公司 Graphics processing unit rendering mode selection system
CN116670719A (en) * 2020-12-27 2023-08-29 华为技术有限公司 Graphic processing method and device and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAVID CORBALAN-NAVARRO等: "Triangle Dropping: An Occluded-geometry Predictor for Energy-efficient Mobile GPUs", ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, vol. 19, no. 3, pages 39 *
JAE-HO NAH等: "L-Bench: An Android benchmark set for low-power mobile GPUs", COMPUTERS & GRAPHICS, vol. 61, pages 40 - 49 *
自由自在: "TBR和TBDR", Retrieved from the Internet <URL:http://zhuanlan.zhihu.com/p/429519726> *

Similar Documents

Publication Publication Date Title
US8587593B2 (en) Performance analysis during visual creation of graphics images
US11200724B2 (en) Texture processor based ray tracing acceleration method and system
CN110704768B (en) Webpage rendering method and device based on graphics processor
GB2558886A (en) Graphics processing units and methods for controlling rendering complexity using cost indications for sets of tiles of a rendering space
WO2021248705A1 (en) Image rendering method and apparatus, computer program and readable medium
US9626733B2 (en) Data-processing apparatus and operation method thereof
CN109785417B (en) Method and device for realizing OpenGL cumulative operation
EP3138006A1 (en) System and method for unified application programming interface and model
CN111080761B (en) Scheduling method and device for rendering tasks and computer storage medium
CN116185743B (en) Dual graphics card contrast debugging method, device and medium of OpenGL interface
KR101431311B1 (en) Performance analysis during visual creation of graphics images
CN114998087A (en) Rendering method and device
CN108509241B (en) Full-screen display method and device for image and mobile terminal
KR20150106846A (en) Improvements in and relating to rendering of graphics on a display device
US11195248B2 (en) Method and apparatus for processing pixel data of a video frame
KR101286938B1 (en) Partitioning-based performance analysis for graphics imaging
CN116894906A (en) Graphics rendering method and processor hardware architecture
CN115861510A (en) Object rendering method, device, electronic equipment, storage medium and program product
US8988444B2 (en) System and method for configuring graphics register data and recording medium
CN116991600B (en) Method, device, equipment and storage medium for processing graphic call instruction
CN114582301B (en) Information display method and device, electronic equipment and storage medium
CN108399595A (en) A kind of realization device and method of image processing algorithm
CN117952816A (en) Multi-view image generation device and graphics processor
CN117788666A (en) Computer special effect generation method and system
CN117442974A (en) Virtual object cluster rendering method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20231017