CN117762375B

CN117762375B - Data processing method, device, computing device, graphics processor, and storage medium

Info

Publication number: CN117762375B
Application number: CN202311790497.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-10-29
Anticipated expiration: 2043-12-22
Also published as: CN117762375A

Abstract

The present disclosure relates to data processing methods, devices, computing devices, graphics processors, and storage media. The method is used for a computing unit comprising an arithmetic unit and a shifter, the method comprising: the shifter acquires an intermediate calculation result output by the operation unit, wherein the intermediate calculation result is a floating point number; the shifter shifts the mantissa of the intermediate calculation result, outputs a target calculation result, wherein the lowest bit of the mantissa in the target calculation result is an approximate bit corresponding to rounding operation, and the approximate bit is used for rounding operation of the target calculation result by the operation unit. According to the embodiment of the application, the processing pressure of the operation unit can be reduced, the processing speed is improved, the computing efficiency of the operation unit is improved, and the computing resource and the computing time of hardware are saved.

Description

Data processing method, device, computing device, graphics processor, and storage medium

Technical Field

The present disclosure relates to the field of processor technologies, and in particular, to a data processing method, device, computing device, graphics processor, and storage medium.

Background

The floating point number calculation in the mainstream computer chip at present complies with the corresponding standard, such as the IEEE754 standard, whether the calculation result is a normalized number or not, and finally the calculation result is output according to the format specified by the standard. In general, a floating point number calculation result which does not conform to the standard format is likely to be obtained in the calculation process, so that the floating point number calculation result needs to be processed to obtain a calculation result which conforms to the standard format.

The sticky bits (stinky bits) corresponding to the rounding operations may occur either within or outside the range of the computed result mantissa, as a result of processing according to conventional methods. Thus, the arithmetic unit needs to determine two different sticky bits when performing the rounding operation. And if the calculation unit supports the floating point number calculation results in multiple target formats at the same time, the reserved bit (guard bit), the approximate bit (round bit) and the sticky bit of the mantissa may appear in different positions according to the result obtained by processing in the conventional method, so that the calculation unit needs to select according to the target formats before rounding operation, thereby increasing the burden of hardware resources and reducing the work efficiency of hardware. Therefore, a new method is needed to save the computing resources and the computation time of the hardware.

Disclosure of Invention

In view of this, the present disclosure proposes a data processing method, apparatus, computing apparatus, graphics processor, and storage medium.

According to an aspect of the present disclosure, a data processing method is provided. The method may be used in a computing unit comprising an arithmetic unit and a shifter, the method may comprise:

The shifter acquires an intermediate calculation result output by the operation unit, wherein the intermediate calculation result is a floating point number;

The shifter shifts the mantissa of the intermediate calculation result, outputs a target calculation result, wherein the lowest bit of the mantissa in the target calculation result is an approximate bit corresponding to rounding operation, and the approximate bit is used for rounding operation of the target calculation result by the operation unit.

In one possible implementation, the shifter performs a shift process on mantissas of the intermediate calculation result, and outputs a target calculation result, including:

the shifter determines a target number of bits for shifting the mantissa of the intermediate calculation result based on the first number of bits for shifting the mantissa of the intermediate calculation result to the left/right and the second number of bits for shifting the mantissa of the intermediate calculation result to the right;

the shifter shifts the mantissa of the intermediate calculation result based on the number of target bits, and outputs the target calculation result.

In one possible implementation, the second bit is determined based on a difference between the mantissa bit width of the intermediate calculation result and the mantissa bit width of the target floating point number format.

In one possible implementation, the first number of bits is determined based on the number of leading zeros in the mantissa of the intermediate calculation result in the case where the difference between the exponent corresponding to the most significant bit of the mantissa of the intermediate calculation result and the number of leading zeros in the mantissa of the intermediate calculation result is not less than the exponent minimum value of the target floating point number format.

In one possible implementation, the first number of bits is determined based on a difference between an exponent corresponding to a mantissa most significant bit of the intermediate calculation result and an exponent minimum of the target floating point number format, where the difference between the exponent corresponding to the mantissa most significant bit of the intermediate calculation result and the leading zero number in the mantissa of the intermediate calculation result is less than the exponent minimum of the target floating point number format.

In one possible implementation, the shifter determines a target number of bits to shift the mantissa of the intermediate calculation result based on a first number of bits to shift/right the mantissa of the intermediate calculation result and a second number of bits to shift the mantissa of the intermediate calculation result to the right, comprising:

the shifter determines a target number of bits to shift the mantissa of the intermediate calculation result based on a difference between the first number of bits and the second number of bits.

In one possible implementation, the shifter performs a shift process on mantissas of the intermediate calculation result based on the target number of bits, and outputs the target calculation result, including:

The shifter shifts the target bit number leftwards for the mantissa of the intermediate calculation result under the condition that the target bit number is not smaller than zero, and outputs the target calculation result; otherwise the first set of parameters is selected,

The shifter shifts the mantissa of the intermediate calculation result by the absolute value of the target number of bits to the right when the target number of bits is smaller than zero, and outputs the target calculation result.

In one possible implementation, in the case of a mantissa of the intermediate calculation result moving to the left, the sticky bit corresponding to the rounding operation is zero;

in the case where the mantissa of the intermediate calculation result is shifted rightward, the sticky bit corresponding to the rounding operation is a shifted-out portion to the right.

In one possible implementation, the target floating point number format is a format specified by the IEEE754 standard.

In one possible implementation, the computing unit includes an arithmetic logic unit ALU of the graphics processor GPU.

According to another aspect of the present disclosure, a computing device is provided. The computing device may include:

The shifter is used for acquiring an intermediate calculation result output by the operation unit, wherein the intermediate calculation result is a floating point number, shifting the mantissa of the intermediate calculation result and outputting a target calculation result, and the lowest position of the mantissa in the target calculation result is an approximate position corresponding to rounding operation;

and the operation unit is used for outputting the intermediate calculation result and rounding the target calculation result.

In one possible implementation, the shifting the mantissa of the intermediate calculation result, and outputting the target calculation result includes:

Determining a target number of bits for shifting the mantissa of the intermediate calculation result based on the first number of bits for shifting the mantissa of the intermediate calculation result to the left/right and the second number of bits for shifting the mantissa of the intermediate calculation result to the right;

and performing shift processing on mantissas of the intermediate calculation result based on the target bit number, and outputting the target calculation result.

In one possible implementation, determining a target number of bits to shift the mantissa of the intermediate calculation result based on a first number of bits to shift/shift the mantissa of the intermediate calculation result to the left/to the right and a second number of bits to shift the mantissa of the intermediate calculation result to the right includes:

Based on the difference between the first number of bits and the second number of bits, a target number of bits to shift the mantissa of the intermediate calculation result is determined.

In one possible implementation, the shifting the mantissa of the intermediate calculation result based on the target number of bits, and outputting the target calculation result includes:

Under the condition that the target bit number is not smaller than zero, moving the target bit number to the left by the mantissa of the intermediate calculation result, and outputting the target calculation result; otherwise the first set of parameters is selected,

And when the target bit number is smaller than zero, shifting the mantissa of the intermediate calculation result to the right by the absolute value of the target bit number, and outputting the target calculation result.

According to another aspect of the present disclosure, there is provided a graphic processor including: such as the computing device described above.

According to another aspect of the present disclosure, there is provided a data processing apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the above-described method.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of a data processing apparatus, performs the above method.

According to the embodiment of the application, after the shifter shifts the mantissa of the intermediate calculation result output by the obtained calculation unit, the lowest position of the mantissa in the target calculation result is the approximate position corresponding to the rounding operation, and the sticky bit is out of the lowest position at the moment, so that the calculation unit only needs to process the sticky bit under one condition when rounding the target calculation result, and the reserved bit, the approximate bit and the sticky bit are not required to be selected from different positions according to the target floating point number format, the reserved bit, the approximate bit and the sticky bit can be directly extracted, and the processing pressure of the calculation unit is reduced, the processing speed is improved, the calculation efficiency of the calculation unit is improved, and the calculation resources and the calculation time of hardware are saved by improving the working mode of the shifter and the cooperation mode of the shifter between the calculation units.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a schematic diagram of a displacement operation.

Fig. 2 shows another schematic diagram of the displacement operation.

FIG. 3 shows a schematic diagram of different floating point number formats.

Fig. 4 shows a block diagram of a computing unit according to an embodiment of the application.

Fig. 5 shows a flow chart of a data processing method according to an embodiment of the application.

Fig. 6 shows a flow chart of a data processing method according to an embodiment of the application.

FIG. 7 illustrates a schematic diagram of mantissa left shift of an intermediate calculation result according to an embodiment of the present application.

FIG. 8 shows a schematic diagram of mantissa right shift of an intermediate calculation result according to an embodiment of the present application.

FIG. 9 illustrates a block diagram of a computing device according to an embodiment of the application.

FIG. 10 is a block diagram illustrating an apparatus 1900 for data processing according to an example embodiment.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

The floating point number calculation in the mainstream computer chip at present complies with the corresponding standard, such as the IEEE754 standard, whether the calculation result is a normalized number or not, and finally the calculation result is output according to the format specified by the standard. In general, a floating point number calculation result which does not conform to the standard format is likely to be obtained in the calculation process. It is therefore necessary to process it to obtain a calculation result that complies with the standard format.

One example of this is seen in fig. 1, which shows a schematic diagram of a displacement operation. As shown in fig. 1, in the process of floating point multiplication and addition, after the multiplication and addition of mantissas (shown as mat0×mat1+mat2 in the figure), the resulting mantissa result is a mantissa with several 0 s on the high order, and the conventional method for processing such result is to shift the result to the left, shift the first 1 appearing on the high order to the highest order of the calculated result, then take the valid bit (output part as shown) required in the standard out of the high order according to the target floating point format, and extract the reserved bit (guard bit), approximate bit (round bit) and sticky bit (sticky bit) for rounding operation, so as to finally obtain the mantissa of the normalized floating point result conforming to the standard. The sticky bits of the rounding operation occur in the range of the result mantissa.

Another example can be seen in fig. 2, which shows another schematic diagram of a displacement operation. As shown in FIG. 2, the absolute value of the floating point number calculation result may be less than the normalized representation range as specified in IEEE754 standard format, e.g., the most significant bit of the mantissa after multiply-add calculation is 1 in the figure, but since the exponent is less than the exponent minimum value of the target floating point number format, the traditional method of handling such result is to shift the mantissa to the right, thereby adjusting the exponent to the minimum value of the target floating point number format. The mantissa in the target floating-point number format is then fetched from the first segment of the resulting mantissa result (as shown in the output portion) and the reserved bits, approximated bits, and sticky bits are extracted for rounding operations to ultimately yield a mantissa of the denormalized floating-point number result that meets the standard specifications. At this point, some of the sticky bits of the rounding operation may occur outside the range of the mantissa of the result.

According to the result of the above conventional method, the sticky bits corresponding to the rounding operation may occur either in the range of the mantissa of the calculation result or outside the range of the mantissa of the calculation result. Thus, the arithmetic unit needs to determine two different sticky bits when performing the rounding operation.

And if the calculation unit supports the floating point number calculation results in multiple target formats at the same time, the reserved bits, the approximate bits and the sticky bits of the mantissa may appear in different positions according to the result obtained by processing in the conventional method, see fig. 3, which shows a schematic diagram of different floating point number formats. As shown in fig. 3, if the computing unit supports multiple target floating point number formats at the same time, reserved bits, approximate bits and sticky bits corresponding to rounding operations in different formats may appear in different positions, so that the computing unit needs to select according to the target format to perform the rounding operations, which increases the burden of hardware resources and reduces the work efficiency of hardware. Therefore, a new method is needed to save the computing resources and the computation time of the hardware.

In view of the foregoing, embodiments of the present application provide a data processing method, apparatus, computing apparatus, graphics processor, and storage medium. According to the method, after the shifter shifts the mantissa of the intermediate calculation result output by the obtained calculation unit, the lowest position of the mantissa in the target calculation result is the approximate position corresponding to rounding operation, and the sticky bit is out of the lowest position, so that the calculation unit only needs to process the sticky bit under one condition when rounding operation is carried out on the target calculation result, the reserved bit, the approximate bit and the sticky bit are not required to be selected from different positions according to the target floating point number format, the reserved bit, the approximate bit and the sticky bit can be directly extracted, and the processing pressure of the calculation unit is reduced, the processing speed is improved, the calculation efficiency of the calculation unit is improved, and the calculation resources and the calculation time of hardware are saved through improvement of the working mode of the shifter and improvement of the cooperation mode of the shifter between the calculation units.

Fig. 4 shows a block diagram of a computing unit according to an embodiment of the application. As shown in fig. 4, the method of the embodiment of the present application may be used in a computing unit 400, which may be an arithmetic logic unit (ARITHMETIC LOGIC UNIT, ALU) included in any processor, such as a graphics processor (graphic process unit, GPU), a central processing unit (central process unit, CPU), and the like.

The calculation unit may include an operation unit 401 and a shifter 402. The arithmetic unit may include an arithmetic unit (such as an adder, a multiplier, etc.) and a logic unit, and may be used to perform corresponding arithmetic operations, output intermediate calculation results, and may also be used to perform rounding operations on data; the shifter may be used to perform a shift operation on the data.

Optionally, the computing unit may further include other subunits, such as registers, etc., which the present application is not limited to.

The data processing method according to the embodiment of the present application is described below on the basis of the configuration of the computing unit shown in fig. 4.

Fig. 5 shows a flow chart of a data processing method according to an embodiment of the application. The data processing method may be used for a computing unit, which may comprise an arithmetic unit and a shifter, as shown in fig. 5, the method may comprise:

in step S501, the shifter acquires the intermediate calculation result output by the operation unit.

An example of the shifter may be referred to as shifter 402 in fig. 4, and an example of the operation unit may be referred to as operation unit 401 in fig. 4.

Wherein the intermediate calculation result is a floating point number. For example, it may be a multiplication result output from a multiplier of the operation unit. The intermediate calculation result may be a floating point number that does not meet the standard format requirements. Thus, the mantissa of the intermediate calculation result may be processed by the shifter so that the mantissa of the intermediate calculation result conforms to the mantissa in the standard format.

In step S502, the shifter shifts the mantissa of the intermediate calculation result and outputs the target calculation result.

After the mantissa of the intermediate calculation result is shifted by the shifter, the lowest bit of the mantissa in the target calculation result is an approximate bit corresponding to the rounding operation, and the approximate bit is a next bit of the reserved bit, which represents the highest bit discarded during the rounding operation. The reserved bits may represent the least significant bits reserved for the mantissa portion.

The target calculation result output by performing the shift processing may be a result conforming to a predetermined standard, for example, conforming to the IEEE754 standard. Wherein the compliance with the predetermined standard may mean that the output target calculation result may be represented by a format prescribed by the predetermined standard, for example, under the IEEE754 standard, the value (value) of the output target calculation result after the shift processing may be represented by a sign bit (sign) x exponent term (exponent) x mantissa part (mantissa).

The approximation bits may be used by the arithmetic unit to round the target calculation result. By making the lowest order bit of the mantissa in the output target calculation result be the approximate bit corresponding to the rounding operation, and the approximate bit be the next order bit of the reserved bit, and the sticky bit be the next p-order bit of the approximate bit (p is an integer, determined according to the mantissa bit width of the intermediate calculation result), that is, since the approximate bit is determined, the reserved bit and the sticky bit corresponding to the rounding operation are fixed, the arithmetic unit can directly extract the reserved bit, the approximate bit and the sticky bit from the corresponding positions for rounding operation without considering different target floating point number formats (e.g., the mantissa lengths of single precision and double precision floating point numbers are different). The manner in which the rounding operation is performed can be determined based on the prior art, with different accuracy requirements for the rounding for the different arithmetic units and the arithmetic tasks. Wherein the approximation bits may be utilized to determine whether to carry to the reserved bits.

According to the embodiment of the application, after the shifter shifts the mantissa of the intermediate calculation result output by the obtained calculation unit, the lowest bit of the mantissa in the target calculation result is the approximate bit corresponding to the rounding operation, and the sticky bit is out of the lowest bit at the moment, so that the calculation unit only needs to process the sticky bit under one condition when rounding the target calculation result, and the reserved bit, the approximate bit and the sticky bit do not need to be selected from different positions according to the target floating point number format, the calculation efficiency of the calculation unit can be directly extracted, and the calculation resources and the calculation time of hardware are saved.

The manner in which the shifter shifts the mantissa of the intermediate calculation result will be described in detail below. In order that the lowest bit of the mantissa is the approximate bit corresponding to the rounding operation after the shift processing, the number of bits for shifting the mantissa may be comprehensively determined according to the first number of bits for shifting the mantissa to the left/right and the second number of bits for shifting the mantissa to the right, and fig. 6 is a flowchart illustrating a data processing method according to an embodiment of the present application. As shown in fig. 6, the step S502 may include:

In step S601, the shifter determines the target number of bits for shifting the mantissa of the intermediate calculation result based on the first number of bits for shifting the mantissa of the intermediate calculation result to the left/right and the second number of bits for shifting the mantissa of the intermediate calculation result to the right.

In S601, the order of determining the first number of bits and the second number of bits is not fixed, and the first number of bits for moving the mantissa of the intermediate calculation result to the left/right may be determined first, and then the second number of bits for moving the mantissa of the intermediate calculation result to the right may be determined, or the second number of bits may be determined first, and then the first number of bits may be determined.

Since there may be two cases of intermediate calculation results obtained, reference may be made to fig. 1 and fig. 2, respectively. In the first case, the mantissa of the intermediate calculation result is a mantissa with a plurality of 0 s on the high order, but the leading zero number in the mantissa of the intermediate calculation result is not less than the exponent minimum value of the target floating point number format; in the second case, the number of leading zeros in the mantissa of the intermediate calculation result is less than the exponent minimum of the target floating point number format. The first digit of the mantissa shift left/right can thus be calculated for these two different cases, respectively.

Wherein, in the case that the first digit is not less than 0, it can be determined that the mantissa of the intermediate calculation result moves to the left; in the case where the first digit is smaller than 0, it may be determined that the mantissa of the intermediate calculation result moves to the right.

Alternatively, in a case where a difference between an exponent corresponding to a mantissa most significant bit of the intermediate calculation result and a leading zero number in the mantissa of the intermediate calculation result is not less than an exponent minimum value of the target floating point number format, the first digit number may be determined based on the leading zero number in the mantissa of the intermediate calculation result.

Taking the normalized floating point number format under the IEEE754 standard as an example, the intermediate calculation result is not smaller than the minimum value that the normalized floating point number can represent under the IEEE754 standard.

Wherein, the leading zero number (may be referred to as z) in the mantissa of the intermediate calculation result may refer to the number of 0s before the most significant 1 appears on the mantissa, and the exponent minimum (may be referred to as exp_min) of the target floating point number format may be determined according to different target floating point number formats, and the exponent minimum may also be used to determine the minimum of the floating point number that may be represented in the form of an exponent. The target floating point number format may be a format specified by the IEEE754 standard, for example, under the IEEE754 standard, the target floating point number format may include a single precision floating point number for which the exponent minimum is-126 and a double precision floating point number for which the exponent minimum is-1022. The exponent minimum may be a binary bit width where the exponent minimum is represented in binary, e.g., an exponent bit width of 11 for a double precision floating point number.

In the case where the difference between the exponent corresponding to the mantissa most significant bit of the intermediate calculation result and the leading zero number in the mantissa of the intermediate calculation result is less than the exponent minimum value of the target floating point number format, the first number of bits may be determined based on the difference between the exponent corresponding to the mantissa most significant bit of the intermediate calculation result and the exponent minimum value of the target floating point number format.

Taking the target floating point format as an example, the normalized floating point format under the IEEE754 standard, that is, the intermediate calculation result is smaller than the minimum value that the normalized floating point can represent under the IEEE754 standard.

The exponent corresponding to the most significant bit of the mantissa of the intermediate calculation result may refer to the exponent corresponding to the most significant bit of the intermediate calculation result when the mantissa of the intermediate calculation result is not processed (may be referred to as exp).

For intermediate calculation results in two different cases, the second number of bits may be determined based on a difference between the mantissa bit width of the intermediate calculation result and the mantissa bit width of the target floating point number format.

Wherein, the mantissa bit width (may be referred to as n) of the intermediate calculation result may refer to the bit width size of the intermediate calculation result when the mantissa is unprocessed; the mantissa bit width (which may be referred to as m) of the target floating-point number format may be predetermined, e.g., 23 bits for single precision floating-point numbers and 52 bits for double precision floating-point numbers.

In step S602, the shifter shifts the mantissa of the intermediate calculation result based on the number of target bits, and outputs the target calculation result.

The shifter may determine a target number of bits to shift the mantissa of the intermediate calculation result based on a difference between the first number of bits and the second number of bits, see below.

Under the condition that the difference value between the exponent corresponding to the mantissa most significant bit of the intermediate calculation result and the leading zero number in the mantissa of the intermediate calculation result is not smaller than the exponent minimum value of the target floating point number format, namely exp-z is not smaller than exp_min, one calculation mode of the target bit number can be seen in the formula (1):

shf _val=z-n +. M+2 formula (1)

Wherein shf _val may represent the target number of bits, z represents the leading number of zeros in the mantissa of the intermediate calculation result, n represents the mantissa bit width of the intermediate calculation result, and m represents the mantissa bit width of the target floating point number format. The first number of bits is z and the second number of bits is n-m-2.

In the case where the difference between the exponent corresponding to the mantissa most significant bit of the intermediate calculation result and the leading zero number in the mantissa of the intermediate calculation result is smaller than the exponent minimum value of the target floating point number format, that is, exp-z < exp_min, one calculation manner of the target number of bits may be seen in formula (2):

shf _val=exp exp_min-n+m+2 formula (2)

Wherein exp represents an exponent corresponding to a mantissa most significant bit of the intermediate calculation result, exp_min represents an exponent minimum value of the target floating point number format. The first number of bits is exp-exp_min, and the second number of bits is n-m-2.

Optionally, the step S602 may include:

The shifter shifts the mantissa of the intermediate calculation result leftwards by the target bit number under the condition that the target bit number is not less than zero, namely shf _val is not less than 0, namely, shifts the mantissa of the intermediate calculation result leftwards by shf _val, and outputs the target calculation result; otherwise, the shifter shifts the mantissa of the intermediate calculation result to the right by the absolute value of the target bit number, that is, shifts the mantissa of the intermediate calculation result to the right by-shf _val bit, and outputs the target calculation result when the target bit number is smaller than zero, that is, shf _val < 0.

Wherein, in the case that the mantissa of the intermediate calculation result moves to the left, the sticky bit corresponding to the rounding operation is zero. Referring to FIG. 7, a diagram illustrating a left shift of mantissas of intermediate calculation results according to one embodiment of the present application is shown. In the case where the mantissa of the intermediate calculation result is shifted to the left as shown in fig. 7, the lowest order of the mantissa after the shift operation is an approximate bit, and the output portion of the mantissa is from the last 1 st order to the reserved bit after the shift as shown in the figure, and the approximate bit is the last 1 st order of the output portion, and there is no sticky bit at this time.

In the case where the mantissa of the intermediate calculation result is shifted rightward, the sticky bit corresponding to the rounding operation is a shifted-out portion to the right. Referring to FIG. 8, a diagram illustrating a right shift in mantissa of an intermediate calculation result according to one embodiment of the present application is shown. In the case where the mantissa of the intermediate calculation result is shifted to the right as shown in fig. 8, the lowest order bit of the mantissa after the shift operation is an approximate bit, the output portion of the mantissa is shown as the last 1 st to reserved bit of the highest order bit after the shift, the approximate bit is the last 1 bit of the output portion, and the sticky bit is all bits after the approximate bit.

As can be seen from fig. 7 and fig. 8, after the shift processing is performed according to the method of the embodiment of the present application, the sticky bits are all out of the lowest bit of the calculation window (corresponding to the mantissa portion), so that the calculation efficiency of the calculation unit is improved without considering that the sticky bits are in the calculation window when the subsequent calculation unit performs the rounding operation.

Therefore, the subsequent operation unit can directly extract the approximate bit, and round the mantissa of the target calculation result based on the approximate bit, so that the result of the rounding operation can meet the requirement of the subsequent operation. In the process, different target floating point number formats are not needed to be considered, so that the calculation time and calculation resources of the operation unit are saved.

FIG. 9 illustrates a block diagram of a computing device according to an embodiment of the application. As shown in fig. 9, the computing device may include:

The shifter 901 is configured to obtain an intermediate calculation result output by the operation unit, where the intermediate calculation result is a floating point number, perform shift processing on a mantissa of the intermediate calculation result, and output a target calculation result, where a lowest bit of the mantissa in the target calculation result is an approximate bit corresponding to the rounding operation;

The operation unit 902 is configured to output an intermediate calculation result and perform rounding operation on a target calculation result.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The disclosed embodiments also provide a graphics processor, including: such as the computing device described above.

The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiment of the disclosure also provides a data processing device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions stored by the memory.

Embodiments of the present disclosure also provide a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of a data processing apparatus, performs the above method.

FIG. 10 is a block diagram illustrating an apparatus 1900 for data processing according to an example embodiment. For example, apparatus 1900 may be provided as a server or terminal device on which the computing unit of the application may be deployed. Referring to fig. 10, the apparatus 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that are executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.

The apparatus 1900 may further comprise a power component 1926 configured to perform power management of the apparatus 1900, a wired or wireless network interface 1950 configured to connect the apparatus 1900 to a network, and an input/output interface 1958 (I/O interface). The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server ^TM,Mac OS X^TM,Unix^TM,Linux^TM,FreeBSD^TM or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of apparatus 1900 to perform the above-described methods.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A data processing method for a computing unit comprising an arithmetic unit and a shifter, the method comprising:

And the shifter shifts the mantissa of the intermediate calculation result, outputs a target calculation result, wherein the lowest bit of the mantissa in the target calculation result is an approximate bit corresponding to rounding operation, the viscous bit corresponding to the rounding operation is out of the lowest bit of the mantissa, and the approximate bit is used for the operation unit to carry out the rounding operation on the target calculation result.

2. The method of claim 1, wherein the shifter shifts mantissas of the intermediate calculation results, and outputs target calculation results, comprising:

the shifter determining a target number of bits for shifting the mantissa of the intermediate calculation result based on a first number of bits for shifting the mantissa of the intermediate calculation result to the left/right and a second number of bits for shifting the mantissa of the intermediate calculation result to the right, the first number of bits being determined based on a difference between a leading zero number in the mantissa of the intermediate calculation result or an exponent corresponding to a mantissa highest bit of the intermediate calculation result and a exponent minimum of a target floating point number format;

And the shifter shifts the mantissa of the intermediate calculation result based on the target bit number and outputs a target calculation result.

3. The method of claim 2, wherein the second bit is determined based on a difference between a mantissa bit width of the intermediate calculation result and a mantissa bit width of the target floating point number format.

4. The method of claim 2, wherein the first number of bits is determined based on the number of leading zeros in the mantissa of the intermediate calculation result if the exponent corresponding to the most significant mantissa of the intermediate calculation result differs from the number of leading zeros in the mantissa of the intermediate calculation result by no less than an exponent minimum in a target floating point number format.

5. The method of claim 2, wherein the first number of bits is determined based on a difference between an exponent corresponding to a mantissa most significant bit of the intermediate calculation result and an exponent minimum of the target floating point format, in the case where the difference between the exponent corresponding to the mantissa most significant bit of the intermediate calculation result and a leading zero number in the mantissa of the intermediate calculation result is less than the exponent minimum of the target floating point format.

6. The method of claim 2, wherein the shifter determining the target number of bits to shift the mantissa of the intermediate calculation result based on a first number of bits to shift/right the mantissa of the intermediate calculation result and a second number of bits to shift the mantissa of the intermediate calculation result to the right comprises:

The shifter determines a target number of bits to shift mantissas of the intermediate calculation result based on a difference between the first number of bits and the second number of bits.

7. The method according to claim 2, wherein the shifter shifts mantissas of the intermediate calculation results based on the target number of bits, and outputs target calculation results, comprising:

The shifter shifts the mantissa of the intermediate calculation result leftwards by the target digit number under the condition that the target digit number is not smaller than zero, and outputs a target calculation result; otherwise the first set of parameters is selected,

And the shifter shifts the mantissa of the intermediate calculation result to the right by the absolute value of the target digit under the condition that the target digit is smaller than zero, and outputs a target calculation result.

8. The method of claim 1, wherein in the event that the mantissa of the intermediate calculation result is shifted to the left, the sticky bit corresponding to the rounding operation is zero;

In the case that the mantissa of the intermediate calculation result moves to the right, the sticky bit corresponding to the rounding operation is a right-shifted-out portion.

9. The method of claim 3, wherein the target floating point number format is a format specified by the IEEE754 standard.

10. The method of claim 1, wherein the computing unit comprises an arithmetic logic unit ALU of a graphics processor GPU.

11. A computing device, the computing device comprising:

The shifter is used for acquiring an intermediate calculation result output by the operation unit, wherein the intermediate calculation result is a floating point number, carrying out shift processing on mantissas of the intermediate calculation result, and outputting a target calculation result, wherein the lowest bit of the mantissas in the target calculation result is an approximate bit corresponding to rounding operation, and the viscosity bit corresponding to the rounding operation is out of the lowest bit of the mantissas;

And the operation unit is used for outputting an intermediate calculation result and carrying out the rounding operation on the target calculation result.

12. A graphics processor, comprising: the computing device of claim 11.

13. A data processing apparatus, comprising:

A processor;

A memory for storing processor-executable instructions;

Wherein the processor is configured to implement the method of any one of claims 1 to 10 when executing the instructions stored by the memory.

14. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 10.