CN113723599A

CN113723599A - Neural network computing method and device, board card and computer readable storage medium

Info

Publication number: CN113723599A
Application number: CN202010457225.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2021-11-30

Abstract

The disclosure relates to a neural network computing method, device, board and computer readable storage medium, wherein the neural network computing device of the disclosure is included in an integrated circuit device, and the integrated circuit device comprises a universal interconnection interface and other processing devices. The neural network computing device interacts with other processing devices to jointly complete computing operation designated by a user. The integrated circuit device may further include a storage device connected to the neural network computing device and the other processing device, respectively, for data storage of the neural network computing device and the other processing device.

Description

Neural network computing method and device, board card and computer readable storage medium

Technical Field

The present disclosure relates generally to the field of neural networks. More particularly, the present disclosure relates to a neural network computing method, apparatus, board, and computer-readable storage medium.

Background

Modern neural networks are continuously increasing the depth, width and resolution of the network in order to process massive data, so that the storage capacity and the calculation amount of the model are increasing, and the neural networks are extremely resource-consuming in training and reasoning, so that a person skilled in the art has no way to reduce energy consumption.

The quantization neural network can compress the model without losing precision, reduce the calculation power consumption and accelerate the model reasoning and training. The quantization is implemented by using low-bit fixed-point numbers instead of floating-point numbers in the network, and the data size can be reduced from the original 32-bit (single-precision floating-point number) to 8-bit or 4-bit. Quantization networks have the advantage that they are already a common technical means in this field.

In training the quantization, the back-propagation gradient must be quantized in addition to the quantization weights and activation values. The backward propagation gradient is computed approximately twice as computationally as the forward propagation. A large number of experiments prove that in the training process of fixed point quantization, the quantization precision requirement of a back propagation gradient is higher than the quantization precision requirement of a forward propagation weight and an activation value, that is, the quantization bit width of the back propagation gradient must be higher than the quantization bit width of the forward propagation weight and the activation value, so that the quantization bit width of the back propagation gradient cannot be made low enough nowadays, generally maintained at 16 bits, the benefit of a compression model is limited, which becomes a main factor limiting the quantization of the fixed point network training process at an edge end, and the training cannot be effectively completed, so that a neural network cannot be applied.

Therefore, a technical problem exists at the present stage, namely, the existing quantification mode cannot effectively reduce energy consumption in the reasoning and training processes.

Disclosure of Invention

In order to at least partially solve the technical problems mentioned in the background, the present disclosure provides a method, an apparatus, a board, and a computer-readable storage medium for neural network computation.

In one aspect, the present disclosure discloses a neural network computing device including a control unit, a quantization unit, and a computation unit. The control unit is used for providing segmentation parameters; the quantization unit is used for quantizing floating point data according to the segmentation parameters to generate fixed point data; the calculating unit is used for calculating the neural network by using the fixed point data.

In another aspect, the present disclosure discloses a multiplication unit including a multiplier, an addition module, and a fixed-point to floating-point converter. The multiplier is used for multiplying the first fixed point data and the second fixed point data to generate a fixed dot product; the addition module is used for adding a plurality of quantization offset coefficients corresponding to the first fixed point data and the second fixed point data to generate a quantization offset coefficient sum; the fixed-point to floating-point converter is used for converting the fixed-point product into floating-point data according to the quantized offset coefficient sum.

In another aspect, the present disclosure discloses an integrated circuit device including the neural network computing device or the multiplication unit, and a board including the integrated circuit device.

In another aspect, the present disclosure discloses a method of computing a neural network, comprising: receiving floating point data that computes the neural network, the floating point data falling within a numerical distribution; dividing the numerical distribution into a first interval and a second interval based on a division parameter; judging whether the floating point data falls within the first interval or not; if so, quantizing the floating-point data according to the segmentation parameters to generate fixed-point data; and calculating the neural network using the setpoint data.

In another aspect, the present disclosure discloses a method of computing a neural network based on floating point data, comprising: presetting a plurality of quantization intervals, wherein each quantization interval corresponds to one quantization formula, and each quantization formula shows different quantized gradients; determining that the floating point data falls within a particular interval of the plurality of quantization intervals; quantizing the floating point data according to a quantization formula corresponding to the specific interval to generate fixed point data; and calculating the neural network using the setpoint data.

In another aspect, the present disclosure discloses a method of computing a neural network based on floating point data, comprising: presetting a plurality of quantization intervals; determining that the floating point data falls within a particular interval of the plurality of quantization intervals; quantizing the floating point data according to the specific interval to generate fixed point data; setting a flag with N bits in a data structure of the fixed point data, wherein the flag bit records the specific interval, and N is a positive integer; and calculating the neural network using the setpoint data.

In another aspect, the present disclosure discloses a method of computing a neural network, comprising: providing a first segmentation parameter and a second segmentation parameter; quantizing first floating point data according to the first segmentation parameter to generate first floating point data; quantizing second floating point data according to the second division parameter to generate second fixed point data; multiplying the first fixed point data and the second fixed point data to generate intermediate data; and calculating the neural network from the intermediate data.

In another aspect, the present disclosure discloses a method of forward propagation in a neural network, comprising: receiving and calculating an activation value and a weight value required by the layer; providing a first segmentation parameter, a first offset value, and a first segmentation value corresponding to the activation value; providing a second segmentation parameter, a second deviation value and a second segmentation value corresponding to the weight; quantizing the activation value according to the first segmentation parameter, the first offset value and the first segmentation value to generate first fixed point data; quantizing the weight according to the second division parameter, the second offset value and the second division value to generate second fixed point data; multiplying the first fixed point data and the second fixed point data to generate intermediate data; performing floating point calculation on the intermediate data to generate an activation value of a next layer; and repeating the steps to execute each layer of calculation so as to complete the neural network.

In another aspect, the present disclosure discloses a method for back propagation in a neural network, the neural network including weights and weight setpoint data after the weight setpoint processing, including: receiving a next layer error value; providing a segmentation parameter, an offset value and a segmentation value corresponding to the next layer error value; quantizing the next layer of error values according to the segmentation parameters, the deviation values and the segmentation values to generate error value fixed point data; multiplying the error value fixed point data and the weight value fixed point data to generate the gradient of the weight value; performing fixed-point calculation on the gradient to generate a cost layer error value; and adjusting the weight value according to the layer error value.

In another aspect, the present disclosure discloses a method of training a neural network, comprising: performing forward propagation; calculating a next layer error value according to the next layer activation value; performing back propagation; and adjusting the weight value according to the layer error value. Wherein the step of performing forward propagation comprises: receiving and calculating an activation value and a weight value required by the layer; quantizing the activation value according to a first segmentation parameter corresponding to the activation value, a first offset value and a first segmentation value to generate first fixed point data; quantizing the weight according to a second division parameter, a second offset value and a second division value corresponding to the weight to generate second fixed point data; multiplying the first fixed point data and the second fixed point data to generate intermediate data; and performing floating point calculation on the intermediate data to generate a next layer of activation value. The step of performing back propagation comprises: quantizing the next-layer error value according to a third segmentation parameter, a third offset value and a third segmentation value corresponding to the next-layer error value to generate third fixed-point data; multiplying the second fixed point data and the third fixed point data to generate the gradient of the weight; a fixed point calculation is performed on the gradient to generate a cost layer error value.

In another aspect, the present disclosure discloses an electronic device comprising one or more processors and a memory. The memory has stored therein computer-executable instructions that, when executed by the one or more processors, cause the electronic device to perform any of the methods as previously described.

In another aspect, the present disclosure discloses a computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, perform any of the methods as previously described.

The present disclosure provides a technical scheme of self-adaptive non-uniform low specific point training. By introducing multiple quantization intervals, a non-uniform fixed point data format and a corresponding hardware technical means, the technical problem that the energy consumption cannot be effectively reduced in the reasoning and training process in the conventional quantization mode is solved, and the technical effects of reducing the memory occupation of a network model, reducing the model training resource consumption and improving the training speed are achieved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:

FIG. 1 is an input-output diagram illustrating a sigmoid function;

FIG. 2 is a diagram showing a four-layer structure of a neural network;

FIG. 3 is a schematic diagram illustrating a neural network computing device of an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a control unit of an embodiment of the present disclosure;

FIG. 5 is a graph showing a possible numerical distribution of input data for an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating an 8-bit non-uniform fixed point quantization data structure of an embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating a method of computing a neural network of another embodiment of the present disclosure;

FIG. 8 is a schematic diagram showing a multiplication unit of another embodiment of the present disclosure;

FIG. 9 is a flow diagram illustrating a method of performing a multiplicative computation neural network according to another embodiment of the present disclosure;

FIG. 10 is a schematic diagram illustrating a non-uniform fixed point quantization data structure of another embodiment of the present disclosure;

FIG. 11 is a graph showing a possible numerical distribution of input data for another embodiment of the present disclosure;

FIG. 12 is a schematic diagram illustrating a neural network computing device of another embodiment of the present disclosure;

FIG. 13 is a flow chart illustrating a method of forward propagation of another embodiment of the present disclosure;

FIG. 14 is a flow chart illustrating a method of back propagation of another embodiment of the present disclosure;

FIG. 15 is a block diagram illustrating an integrated circuit device of another embodiment of the present disclosure; and

fig. 16 is a structural diagram showing a board card according to another embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection".

Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Neural networks are built based on the concept of neurons. A neuron is like a sensor, the activation function of a general sensor is a step function, while the activation function of a neuron is commonly a sigmoid function, which is defined as follows:

the input-output diagram of the sigmoid function is shown in fig. 1, which can map a real number into an interval of 0 and 1, and is suitable for classification.

A neural network is a plurality of neuron systems connected according to a certain rule. Taking a convolutional neural network as an example, the convolutional neural network is roughly composed of the following four layer structures: an input layer, a convolution layer, a pooling layer, and a fully connected layer. Fig. 2 is a four-layer structure diagram showing a neural network 200.

The input layer 201 intercepts part of information from the input data, converts the part of information into a feature matrix mode, and presents the feature matrix mode, wherein the feature matrix carries features corresponding to the part of information. The input data herein may include, but is not limited to, image data, voice data, or text data.

Convolutional layer 202 is configured to receive the feature matrix from input layer 201 and perform feature extraction on the input data through a convolution operation. In practical applications, the convolutional layer 202 may be constructed as multiple layers of convolutional layers, taking image data as an example, the convolutional layers in the first half are used to capture local and detailed information of an image, for example, each pixel of an output image only senses a small range of values of an input image to perform calculation, and the sensing range of the following convolutional layers is increased layer by layer to capture more complex and abstract information of the image. And finally obtaining abstract representations of the images at different scales through the operation of a plurality of convolution layers. Although the feature extraction of the input image is completed through the convolution operation, the information amount of the feature image is too large, the dimension is too high, not only is the calculation time consuming, but also the overfitting is easily caused, and the dimension needs to be further reduced.

Pooling layer 203 is configured to replace a region of data with a value that is typically the maximum or average of all values in the region. If maximum is used, it is called maximum pooling; if an average is used, it is called mean pooling. By pooling, the model size can be reduced and the computation speed increased without losing too much information.

The fully-connected layer 204 acts as a classifier in the whole convolutional neural network 200, which is equivalent to feature space transformation, extracts and integrates all the useful information, and in addition to the nonlinear mapping of the activation function, the multi-layer fully-connected layer can theoretically simulate any nonlinear transformation, and perform information comparison based on different classifications, thereby determining whether the input data is similar to the compared object.

The neural network needs the weight as the parameter of the model, and the optimal solution is obtained for the weight in the process of training the neural network. In addition, some parameters such as the connection method of the neural network, the number of layers of the network, the number of nodes per layer, and the like are not obtained by learning, but are set in advance. These artificially set parameters are called hyper-parameters.

The neural network is divided into forward propagation and backward propagation, the forward propagation is forward calculation in the direction shown in fig. 2, and the state value and the activation value of each neuron are calculated sequentially from an input layer to an output layer. The back propagation is calculated from the output to the input in reverse in order to find the gradient values.

When input data is computed in a neural network, it is generally desirable to find a solution with a loss function (loss function) that is the minimum, indicating that the result that is closest to the real case is obtained. However, the loss function of the neural network is complex, and it is difficult to find an optimal analytical expression. It is common practice to calculate the negative direction of the gradient direction because the maximum value of the negative direction of the gradient is the direction in which the loss function drops the most, and the back propagation algorithm is to find this gradient value, so as to subsequently update the model parameters based on the gradient drop. Therefore, the back propagation algorithm starts from an output layer of the neural network model, and works out the gradient of the model layer by using a chain rule of function derivation, and a solution with a loss function as the minimum value is expected to be obtained.

By calculating the gradient in this way, each neural unit is calculated only once and is not repeatedly calculated. The fundamental reason that the calculation direction can be efficient is that when the gradient is calculated, the former-stage unit depends on the calculation of the latter-stage unit, the gradient value of the latter-stage unit is calculated firstly, and then the former-stage unit is calculated, so that the calculated result can be fully utilized, and the repeated calculation is avoided.

In neural network training, regular penalties and the ReLU activation function are often used. The regular penalty is that when overfitting, in order to match all data in the test set, a high-order function with poor generalization generates large jitter, the jitter causes a derivative to become large, and large parameters are needed to fit all data, so that the condition with large parameters can be penalized by adding a penalty term to avoid the parameters with large jitter.

The ReLU activation function is that when the model is N layers, the activation rate of the neuron is reduced by 2 theoretically^NAnd the ReLU can better realize that the sparse model excavates relevant features and fits training data. Furthermore, the ReLU has the following advantages over other activation functions: for a linear function, the expression capacity of the ReLU is stronger, and the ReLU is particularly embodied in a deep network; for the nonlinear function, the gradient of the ReLU in the non-negative interval is constant, so that the problem of gradient disappearance does not exist, and the convergence rate of the model is maintained in a stable state. The gradient vanishing problem means that when the gradient is smaller than 1, the error between the predicted value and the true value is attenuated once per propagation layer, and if the sigmoid function is used as an activation function in a deep layer model, the phenomenon is particularly obvious, so that the convergence of the model is not stopped.

Due to the regular penalty and the effect of the Relu activation function, the weight, the activation value and the gradient of each layer are not uniform in value distribution, and the value distribution of the weight, the activation value and the gradient shows a characteristic of being close to 0 in the whole training process, such as Gaussian-like distribution. Considering that the values are concentrated around 0, the uniform quantization fully quantizes the values close to 0, which seriously affects the direction of network training. For example, if a 7-bit uniform quantity is usedChemical absolute value less than

The value of (d) will be quantized to 0. This quantification is intolerable for training. Under the prior art uniform quantization scheme, the factor limiting the accuracy of network training is mainly derived from the quantization error of the gradient, especially from the quantization error of the gradient near 0 in the numerical distribution.

The present disclosure provides an adaptive non-uniform low bit quantization scheme, which is suitable for training tasks and reasoning tasks of various neural networks (such as convolutional neural networks, cyclic neural networks, graph neural networks, etc.). As mentioned above, in the process of improving the neural network, two types of parameters are involved, one is a general parameter, i.e. parameter data obtained by training, such as weight w and offset b in y ═ wx + b; the other is called a hyper-parameter, which is parameter data that cannot be obtained by training, and parameters are set before learning is started.

The apparatus in the embodiments of the present disclosure, and the various devices, units, modules, and the like described below may be implemented in the form of hardware circuits, such as digital circuits or analog circuits. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor mentioned in the embodiments may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory, Memory device, and storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and the like.

Alternatively, when the following devices, various devices, units, and modules are implemented in an ASIC, the advantages of the implementation over other hardware are power consumption, reliability, area, and the like, and especially, the implementation is applied to a high-performance and low-power mobile terminal.

One embodiment of the present disclosure is a neural network computing device that divides a numerical distribution of floating point input data into a first interval and a second interval, where different quantization modes are used for floating point data falling in different intervals. A schematic diagram of this embodiment is shown in fig. 3.

The neural network computing device of this embodiment includes a control unit 31, a quantization unit 32, and a calculation unit 33. The control unit 31 is used for providing various parameters and hyper-parameters required in the quantization process; the quantization unit 32 quantizes floating point data according to the parameter and the hyper-parameter to generate fixed point data; the calculation unit 33 is used for calculating the neural network by using the fixed point data. The units of fig. 3 may be implemented in hardware circuits.

The control unit 31 is used for providing the dividing parameter α, the offset shift and the dividing value x required in the quantization process_pAnd the like. As shown in fig. 4, the control unit 31 includes an input/output module 311, a partition parameter generator 312, an offset value generator 313, and a partition value generator 314.

The input/output module 311 serves as a channel for signal transmission between the control unit 31 and the external unit, and can output a signal when a control request occurs, and at the same time, can receive a control signal from the external unit, and send the control signal to the division parameter generator 312, the offset value generator 313, or the division value generator 314.

The partition parameter generator 312 is used to generate the hyper-parameters: the segmentation parameter alpha. The division parameter α is used to define the first interval and the second interval. If the input data x (floating point number) is distributed in the numerical distribution D, the relation between the partition parameter α and the first and second intervals a and B is:

A＝{x|x<2^αmax{abs(D)},x∈D} (1)

B＝{x|x≥2^αmax{abs(D)},x∈D} (2)

referring to FIG. 5, the Gaussian distribution curve represents the possible value distribution of the input data x, the input data x substantially completely falls within the value distribution D, and the first interval A isA positive-negative symmetrical section including 0, i.e., a range between two imaginary lines in the drawing, and a range outside the imaginary lines is a second section B. When the input data x is less than 2^αmax { abs (d) }, which indicates that the input data x falls within the first interval a; when the input data x is greater than or equal to 2^αmax { abs (d) }, indicates that the input data x falls within the second interval B.

The division parameter α determines the position of the broken line in fig. 5, that is, determines the range of the first section a, and the smaller the absolute value of the division parameter α, the larger the range of the first section a. The partition parameter α is related to a quantization bit width b, which represents the number of bits of the fixed point number, where the quantization bit width b is a positive integer and the partition parameter α is a negative integer. The relationship between the two in this embodiment is as follows:

50％×b≤abs(α)≤90％×b

taking the quantization bit width b as 8 (i.e. a fixed point number of 8 bits) as an example, the partition parameter α is optionally one of-4, -5, -6, and-7.

The offset value generator 313 is configured to generate an offset value shift, which represents a quantization offset, and the numerical value generator 313 generates the offset value shift according to the following expression:

wherein the ceil function returns the smallest integer greater than or equal to the expression.

The segmentation value generator 314 is used to generate a segmentation value x_pDividing value x_pA boundary line on the abscissa of fig. 5 between the first section a and the second section B is shown, that is, an abscissa value corresponding to the broken line in fig. 4. The value generator 314 generates the division value x according to the following expression_p：

x_p＝2^αmax{abs(D)} (4)

The division parameter α, the shift value shift and the division value x_pThe parameters are parameters required in the quantization process, and the parameters generated by the control unit 31 are sent to the quantization unit 32 through the input/output module 311.

Optionally, in an embodiment, the control unit 31 may also have other hardware circuit implementations, which are not described herein again.

Returning to fig. 3, the quantization unit 32 includes: absolute value operator 321, comparator 322, two-way selector 323, adder 324, quantizer 325 and bus converter 326.

The absolute value operator 321 receives the input data x, takes the absolute value of the input data x in the floating-point format, and outputs an absolute value abs (x). The comparator 322 receives the absolute value abs (x) and the division value x_pThe absolute value abs (x) is compared with the division value x_pA comparison is made. If the value x is divided_pGreater than the absolute value abs (x), indicating that the input data x falls within the first interval a, with reference to equation (1), the comparator 322 sets the flag bit flag to a value of 1 if the value x is divided_pAnd is less than or equal to the absolute value abs (x), and represents that the input data x falls within the second interval B with reference to equation (2), and the value of the flag is set to 0. The value of flag indicates that the input data x falls in the first interval a or the second interval B.

The two-way selector 323 receives the division parameter α from the control unit 31, and determines whether the output is the division parameter α or 0 based on the value of the flag bit flag. When the value of the flag is 1, it indicates that the input data x falls within the first interval a, and the value in the first interval a including 0 is easily quantized to 0. When the flag value is 0, it indicates that the input data x falls within the second interval B, and the value in this interval does not require precision adjustment, so the two-way selector 323 sets the output value to 0.

The adder 324 receives the output of the two-way selector 323 and the offset value shift from the control unit 31, and adds the offset value shift to the output value of the two-way selector 323 to generate a quantization offset value s. That is, when the flag value is 1, the quantization offset value s is the division parameter α plus the offset value shift; when the value of flag is 0, quantization offset value s is offset value shift.

Quantizer 325 quantizes the input data x into n-bit fixed point data, which is quantized by:

therein, 2^sCalled the quantization interval (interval), the round function is the result of a rounding operation, x [ n-1:0]]Fixed point data quantized for input data x, but with the output x [ n-1:0] of quantizer 325]Only intermediate data, not the final quantization result.

The bus converter 326 is used to combine the flag value with the intermediate data x [ n-1:0] to generate the fixed-point data x [ n:0], and more specifically, to add the flag value to the intermediate data x [ n-1:0] so that the final fixed-point data x [ n:0] is n +1 bits.

In another scenario, if the flag value is not important to the computing unit 33, the embodiment may not include the bus converter 326, and the output x [ n-1:0] of the quantizer 325 is the final quantization result and is directly transmitted to the computing unit 33.

Optionally, in an embodiment, the quantization unit 32 may also have other hardware circuit implementations, which are not described herein again.

This embodiment is combined with a new non-uniform fixed point quantization data structure, and the data format of the non-uniform fixed point data x [ n:0] will be described below. Fig. 6 shows an 8-bit non-uniform fixed point quantization data structure, which includes a sign bit 61, a value bit 62, and a flag bit 63. The sign bit 61 is 1 bit, and is the Most Significant Bit (MSB) for recording the sign of the fixed-point data. The flag bit 63 is 1 bit and is the Least Significant Bit (LSB) and is used to record the value of the flag bit flag. The middle 6 bits are the value bits 62 for recording the value of the fixed-point data. The output x [ n-1:0] of the quantizer 325 corresponds to the sign bit 61 and the value bit 62 of the non-uniform fixed point quantization data structure, and the bus converter 326 records the value of the flag bit flag in the flag bit 63 to generate the complete fixed point data x [ n:0 ]. Based on the non-uniform fixed point quantization data structure of fig. 6, the relationship between the corresponding floating point input data x and the sign bit 61, the value bit 62 and the flag bit 63 is:

x＝(-1)^sign×value×2^α·flag×interval (6)

where sign is the value of sign bit 61 and value is the value of value bit 62.

If the fixed-point number is not 8 bits, but n, then under this fixed-point data structure, its most significant and least significant bits are again sign bit 61 and flag bit 63, and the middle n-2 bits are value bits 62.

Returning to FIG. 3, in the quantization unit 32, the floating-point input data x is converted to fixed-point data x [ n:0], which is sent to the calculation unit 33 for calculation. The calculation unit 33 may perform a specific calculation according to actual requirements, for example, convolution calculation of the full connection layer 204 in fig. 2, after the calculation is completed, an intermediate result y [ n:0] is generated, the intermediate result y [ n:0] is also the fixed point data, and the calculation unit 33 reduces the intermediate result y [ n:0] to the floating point data y, so as to complete the whole calculation process.

The foregoing description uses the input data x as the activation value, but the disclosure is not limited thereto, that is, in the neural network, any data that needs to be quantized (such as weight and gradient, etc.) can be converted into fixed point data by the quantization unit 32.

In summary, this embodiment implements a method of computing a neural network, and fig. 7 shows a flowchart of this method. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.

In step 71, the absolute value operator 321 receives floating point data of the computational neural network, which falls within the numerical distribution. Referring to fig. 3, the absolute value operator 321 receives input data x, takes an absolute value of the input data x in a floating-point format, and outputs an absolute value abs (x). It is particularly emphasized that in this embodiment, a series of floating-point data is received, each of which is processed according to the flow of FIG. 7.

In step 72, the control unit 31 divides the numerical distribution into a first section and a second section based on the division parameter. Referring to fig. 4, the division parameter generator 312 of the control unit 31 generates the division parameter α. The division parameter α is a first interval a and a second interval B defined by equations (1) and (2).

In step 73, comparator 322 determines whether the floating point data falls within the first interval. Referring to FIG. 3, the comparator 322 receives the absolute value abs (x) and the division value x_pThe absolute value abs (x) is compared with the division value x_pA comparison is made. If the value x is divided_pGreater than the absolute value abs (x), indicating that the input data x falls within the first interval a, the comparator 322 sets the flag bit flag to a value of 1 if the division value x is greater than the absolute value abs (x)_pAnd the absolute value is less than or equal to abs (x), the input data x fall in the second interval B, and the value of the flag bit flag is set to be 0.

If the floating point data falls within the first interval, step 74 is performed, and the two-way selector 323, the adder 324, the quantizer 325 and the bus converter 326 quantize the floating point data according to the partition parameter to generate the fixed point data. Referring to fig. 3, the two-way selector 323 receives the division parameter α from the control unit 31, and decides to output the division parameter α or 0 based on the value of the flag bit flag. Since the floating-point data falls within the first interval, the flag has a value of 1, and the two-way selector 323 sets the output value as the division parameter α. The quantization offset value s output from the adder 324 is the division parameter α plus the offset value shift. Quantizer 325 quantizes input data x into intermediate data x based on equation (5)_qThe bus converter 326 then adds the value of flag to the intermediate data x_qTo generate fixed point data x [ n:0]]。

If the floating point data does not fall within the first interval, step 75 is performed, the two-way selector 323. The adder 324, the quantizer 325, and the bus converter 326 quantize floating point data according to the following expression, instead of quantizing floating point data according to the partition parameter, to generate fixed point data x_q：

interval＝2^shift (8)

These expressions are not substantially different from expressions (3) and (5), except that the floating point data falls within the second interval B, so the flag value is 0, and the two-way selector 323 sets the output value to 0. The quantization offset value s output by the adder 324 is only the offset value shift. Quantizer 325 quantizes input data x into intermediate data x_qThe bus converter 326 adds the flag value flag to the intermediate data x_qTo generate fixed point data x [ n:0]]。

In step 76, the calculation unit 33 calculates the neural network using the setpoint data x [ n:0 ]. Referring to FIG. 3, the calculating unit 33 can perform specific calculation according to actual needs, and generate an intermediate result y [ n:0] after the calculation is completed, the intermediate result y [ n:0] is also fixed point data, and the calculating unit 33 restores the intermediate result y [ n:0] to floating point data y, so as to complete the whole calculation process.

This embodiment is an "adaptive" system because the comparator 322 can generate the flag bit flag to record that the input data x falls in the first interval a or the second interval B, so that the quantization unit 32 can select a proper precision for quantization. Furthermore, the first interval A and the second interval B have different precisions, and the first interval A including 0 utilizes 2^αThe precision is sliced finer to avoid a large amount of data close to 0 being quantized to 0, so this embodiment is also a "non-uniform" system.

For the neural network, the calculating unit 33 of the foregoing embodiment can perform an important calculation, namely, matrix multiplication, where the matrix multiplication involves multiplication of a large number of fixed-point numbers, and the present disclosure proposes a multiplication unit with a special structure based on the fixed-point data structure of fig. 6.

When multiplying two fixed-point data having the fixed-point data structure of fig. 6, it is assumed that the first fixed-point data x₁Falling within the first numerical distribution, the second fixed point data x₂Falling within the second numerical distribution, the first segmentation parameter α₁For dividing the first value distribution into a first interval and a second interval, a second division parameter alpha₂For dividing the second numerical distribution into a third interval and a fourth interval, a first flag value flag1 for reflecting the first fixed point data x₁A second flag value flag falling in the first interval or the second interval₂To reflect the second fixed point data x₂Falls within the third interval or the fourth interval. Based on the formula (6), the floating dot product x₁×x₂Comprises the following steps:

another embodiment of the present disclosure is a multiplication unit, which is schematically shown in fig. 8, for implementing the calculation of equation (10). The multiplication unit 80 of this embodiment includes a multiplier 81, an addition module 82, and a fixed-point to floating-point converter 83.

The multiplier 81 is used for the first fixed point data x₁[n-1:0]And second fixed point data x₂[n-1:0]Multiplying to generate a fixed dot product y 2n-1:0]That is to say of the implementation of the formula (10)

First fixed point data x₁[n-1:0]And second fixed point data x₂[n-1:0]The lowest order flag bit flag in the data structure may be removed from the output of the quantizer 325 of fig. 3 or from the output of the bus converter 326, leaving a sign bit and a value bit. Due to the first fixed point data x₁[n-1:0]And second fixed point data x₂[n-1:0]Are all n bits, and the fixed-point product y [2n-1:0]Is 2n bits.

The adding module 82 is used for corresponding a plurality of data x to the first fixed point₁[n-1:0]And second fixed point data x₂[n-1:0]To generate a quantized offset coefficient sum, i.e. implementing equation (10)

As shown in formula (8), of formula (10)

Is equal to

The quantized offset coefficients thus relate to the partition parameter α, flag value flag, and offset value shift.

The addition module 82 includes a first selector 821, a second selector 822, and an adder 823. The first selector 821 according to the first flag value flag₁Setting a first output value as a first division parameter alpha₁Or 0. When the first flag value flag ₁1, i.e. flag bit x₁[n]When the numerical value of (A) is 1, it indicates the first fixed point data x₁[n-1:0]Falling in a first interval, a first segmentation parameter α₁Needs to participate in the calculation, and therefore outputs a first segmentation parameter alpha₁(ii) a When the first flag value flag₁When 0, i.e. flag bit x₁[n]When the numerical value of (A) is 0, it indicates the first fixed point data x₁[n-1:0]Falling in a second interval, a first division parameter α₁Does not participate in the calculation, and therefore outputs 0. Likewise, the second selector 822 responds to a second flag value flag₂Setting the second output value as the second division parameter alpha₂Or 0, the specific operation of which is the same as that of the first selector 821 and will not be described again.

Adder 823 shifts the first offset value₁Second offset value shift₂The first output value and the second output value are added to generate a quantized offset coefficient sum s, i.e., α₁·flag₁+α₂·flag₂+shift₁+shift₂. Wherein the first offset value shift₁And second offset shift₂Is calculated according to equation (3).

The fixed point to floating point converter 83 converts the fixed point product y [2n-1: 0] according to the quantized offset coefficient sum s]Converted to floating point data y. The fixed-point to floating-point converter 83 includes a power of 2 calculator 831 and a multiplier 832. The power of 2 calculator 831 generates a quantization offset value of 2 based on the quantization offset coefficient and s as powers^sTo realize

Multiplier 832 will fix the dot product y 2n-1:0]And quantization offset value 2^sMultiply to obtain floating dot product x₁×x₂。

Optionally, in an embodiment, the multiplication unit 80 may also have other hardware circuit implementations, which are not described herein again.

In summary, this embodiment implements a method for performing multiplicative neural networks, and the flow chart of this method is shown in fig. 9. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.

In step 91, the control unit 31 provides a first segmentation parameter α₁And a second division parameter alpha₂. In more detail, the segmentation parameter generator 312 of the control unit 31 generates the first segmentation parameter α₁And a second division parameter alpha₂。

In step 92, the quantization unit 32 divides the parameter α according to the first division parameter α₁Quantizing first floating-point data x₁To generate a first fixed point data x₁[n:0]. Wherein the comparator 322 is based on the first segmentation parameter α₁And generating a division value x according to equation (4)_pDividing the first numerical distribution into a first interval A and a second interval B; then, the first floating-point data x is judged₁Whether or not it falls within a first interval a defined by the formula (1). Such as the first floating-point data x₁Within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression₁[n-1:0]：

Such as the first floating-point data x₁If the data does not fall within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression₁[n-1:0]：

Finally the bus converter 326 supplements the flag,to generate the complete first fixed point data x₁[n:0]。

In step 93, the quantization unit 32 divides the parameter α according to the second division parameter α₂Quantizing the second floating-point data x₂To generate second fixed point data x₂[n:0]. Likewise, comparator 322 is based on a second segmentation parameter α₂And generating a division value x according to equation (4)_pDividing the second numerical value distribution into a third interval and a fourth interval; then, the second floating-point data x is judged₂Whether or not it falls within a third interval defined by the formula (1). E.g. second floating point data x₂In the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the second fixed-point data x according to the following expression₂[n-1:0]：

E.g. second floating point data x₂If the difference does not fall within the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression₂[n-1:0]：

Finally, the bus converter 326 complements the flag bit flag to generate the second fixed-point data x₂[n:0]。

In step 94, the multiplication unit 80 performs multiplication on the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]A multiplication operation is performed to generate intermediate data y. As mentioned above, the flag bit of the data structure of the fixed-point data of the present disclosure is used to record the first fixed-point data x₁[n:0]And second fixed point data x₂[n:0]The sign bit of the interval (c) is used for recording the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]The sign of the value bit is used for recording the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]Value V of₁,V₂. The multiplier 81 multiplies the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]To generate a sign-sum value sign_t(ii) a The task of the first selector 821 is equivalent to dividing the first division parameter α₁And a flag bit x of the first fixed point data₁[n]To generate a first parameter multiplication value pm₁(ii) a The task of the second selector 822 is equivalent to dividing the second division parameter α₂And the flag bit x of the second fixed point data₂[n]To generate a second parameter multiplication value pm₂(ii) a Finally, the fixed-point to floating-point converter 83 executes the following expression to generate the intermediate data:

in step 95, a neural network is calculated from the intermediate data. In the neural network architecture shown in fig. 2, the input data may be various, such as image, voice, text data, etc., and the data undergoes a large number of multiplications in the input layer 201, the convolutional layer 202, the pooling layer 203, and the fully-connected layer 204, and all of the multiplications can be implemented by the aforementioned steps until the inference process is completed and the image, voice, text data are finally recognized. In addition to the input data, the parameters in the neural network can be converted into fixed point data by the steps and then multiplied by the input data.

Based on the fact that data close to 0 is quantized to 0 during the quantization process, which affects the calculation, the above embodiments divide the value distribution of the input data into two values: one is a positive-negative symmetric section including 0, and the other is a section other than the above. However, the present disclosure does not limit the number of intervals, and may be appropriately divided into a plurality of intervals as long as the numerical value distribution in different ranges needs to be quantified with different accuracy.

When the interval exceeds two, the data structure in fig. 6 only needs to adjust the size of the flag bit 63. Taking 3 or 4 intervals as an example, in order to completely record data falling in one of the 3 or 4 intervals, the flag bit 63 needs 2 bits, as shown by the flag bit 10 in fig. 10. In other words, if the numerical distribution is divided into N intervals, the flag bit requires ceil [ log ]₂N]A bit.

Another embodiment of the present disclosure is a neural network computing device, different from the foregoing embodiments, in that the neural network computing device of this embodiment divides the numerical value distribution of floating point data into a first interval, a second interval and a third interval, and different quantization modes are adopted when the floating point data falls into different intervals. As shown in fig. 11, the numerical distribution D is divided into a first section a, a second section B, and a third section C according to the quantization accuracy.

This embodiment defines the 3 intervals with 2 partition parameters, partition parameter α₁For defining a first interval A and a second interval B, a partition parameter alpha₂For defining a second interval B and a third interval C. Segmentation parameter alpha₁、α₂The relationship with the first interval a, the second interval B and the third interval C is:

in calculating the fixed point data x falling in the first section A_qThe following expression may be employed:

in calculating the fixed point data x falling in the second section B_qThe following expression may be employed:

in calculating the fixed point data x falling in the third section C_qThe following expression may be employed:

interval₃＝2^shift

fig. 12 is a schematic diagram of this embodiment, which is not significantly different from the framework of fig. 3, and only differs from the control unit 121, the comparator 122 and the three-way selector 123.

Compared to the control unit 31 of fig. 3, the control unit 121 outputs the segmentation parameter α₁、α₂To the three-way selector 123 and outputs a signal corresponding to the division parameter α₁First division value x of_p1And corresponds to the segmentation parameter α₂Second division value x of_p2To the comparator 122, its expression is as follows:

compared to the comparator 322 of fig. 3, the comparator 122 is implemented with a two-stage comparison circuit. The first stage compares the absolute value abs (x) of the input data x with a first division value x_p1If the absolute value abs (x) is smaller than the first division value x_p1Indicates that the input data x falls in the first interval A, so it is not necessary to match the second division value x_p2Comparing, outputting flag value flag as 00, if absolute value abs (x) is not less than first division value x_p1When the input data x falls in the second interval B or the third interval C, the input data x enters the second stage circuit, and the absolute value abs (x) of the input data x is compared with the second division value x_p2. If the absolute value abs (x) is smaller than the second segmentation value x_p2Indicates that the input data x falls in the second interval B, so the output flag value flag is 01, if the absolute value abs (x) is not less than the second division value x_p2Since the input data x falls in the third section C, the output flag value flag is 10.

Compared to the two-way selector 323 of fig. 3, the three-way selector 123 receives the division parameter α₁、α₂And determines the output as a division parameter alpha based on the value of a flag bit flag₁、α₂Or 0. When the flag value is 00, it indicates that the input data x falls within the first interval a, so the three-way selector 123 sets the output value as the division parameter α₁. When the flag value is 01, it indicates that the input data x falls within the second interval B, so the three-way selector 123 sets the output value as the division parameter α₂. When the flag value is 10, it indicates that the input data x falls within the third interval C, so the three-way selector 123 sets the output value to 0.

The operation of the other elements is the same as that of the corresponding elements in FIG. 3, and therefore, the description thereof is omitted. Optionally, in an embodiment, the embodiment of fig. 12 may also have other hardware circuit implementation manners, which are not described herein again.

The embodiment of fig. 12 is illustrated with 3 intervals, the disclosure is not limited to the number of intervals, and those skilled in the art can easily deduce a plurality of intervals without creative investment.

Another embodiment of the present disclosure is a method of forward propagation in a neural network, that is, an inference process of the neural network, which may be implemented using the apparatus of fig. 3 or fig. 12. For convenience of explanation, the following description will be made in conjunction with the embodiment of fig. 3, and fig. 13 shows a flowchart of the method of this embodiment. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.

In step 1301, the absolute valueThe operator 321 receives the activation value x required for calculating the layer₁And weight x₂. Activation value x in floating point number format₁And weight x₂The input data is input to the absolute value operator 321, and the absolute value abs (x) is output.

In step 1302, the control unit 31 provides the corresponding activation value x₁First segmentation parameter α of₁First shift value shift₁And a first division value x_p1. First segmentation parameter α₁First shift value shift₁And a first division value x_p1All of which are described in the foregoing embodiments and will not be described again. Wherein the first division value x_p1Can be obtained by calculation of equation (4).

In step 1303, the control unit 31 provides the weight x₂Second division parameter α of₂Second offset value shift₂And a second division value x_p2. Wherein the second division value x_p2Can be obtained by calculation of equation (4).

In step 1304, the quantization unit 32 performs quantization on the basis of the first segmentation parameter α₁First shift value shift₁And a first division value x_p1Quantizing the activation value x₁To generate first fixed point data. Comparator 322 bases on the first division value x_p1Judging the activation value x₁Whether or not it falls within a first interval a defined by the formula (1). Such as an activation value x₁Within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression₁[n-1:0]：

Such as an activation value x₁If the data does not fall within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression₁[n-1:0]：

Finally, the bus converter 326 complements the flag bit flag to generate the complete first fixed point data x₁[n:0]。

In step 1305, the quantization unit 32 performs quantization based on the second division parameter α₂Second offset value shift₂And a second division value x_p2Quantized weight x₂To generate second fixed-point data. In more detail, the weight x₂The second distribution of values falls within a second distribution of values, which is divided into a third interval and a fourth interval. Comparator 322 bases on the second division value x_p2Judging the weight x₂Whether it falls within a third interval, which is also defined by equation (1). Such as weight x₂Falling within the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression₂[n-1:0]：

Such as weight x₂If the difference does not fall within the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression₂[n-1:0]：

In step 1306, the calculation unit 33 performs calculation on the first fixed point data x₁[n:0]And the number x of second fixed points₂[n:0]The multiplication is performed to generate intermediate data. In this embodiment, the calculation unit 33 has the structure of the multiplication unit 80 of fig. 8, and for the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]A multiplication operation is performed to generate intermediate data y. As mentioned above, the flag bit of the data structure of the fixed-point data of the present disclosure is used to record the first fixed-point data x₁[n:0]And second fixed point data x₂[n:0]The sign bit of the interval (c) is used for recording the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]The sign of the value bit is used for recording the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]Value V of₁,V₂. The multiplier 81 multiplies the first fixed point data x₁[n:0]And second fixed point data x₂[n:0]To generate a sign-sum value sign_t(ii) a The task of the first selector 821 is equivalent to dividing the first division parameter α₁And a flag bit x of the first fixed point data₁[n]To generate a first parameter multiplication value pm₁(ii) a The task of the second selector 822 is equivalent to dividing the second division parameter α₂And the flag bit x of the second fixed point data₂[n]To generate a second parameter multiplication value pm₂(ii) a Finally, the fixed-point to floating-point converter 83 executes the following expression to generate intermediate data:

the intermediate data is the output result of the current layer and is also the input data of the next layer, i.e. the activation value. The method returns to step 1301, the activation value of the next layer obtained in the step is input to the next layer, and the execution is repeated until all layers are completely calculated.

In step 1307, neural network reasoning is completed. In the neural network architecture shown in fig. 2, the input data may be various, such as image, voice, text data, etc., and these data are repeatedly executed in the input layer 201, the convolutional layer 202, the pooling layer 203, and the fully-connected layer 204 by multiplying the foregoing steps several times until the inference process is completed, and these image, voice, text data are finally recognized.

Another embodiment of the present disclosure is a method of backpropagating in a neural network by propagating error values to quantify the backpropagation gradient, which may also be implemented using the hardware of fig. 3 or fig. 12. For convenience of explanation, the following description will be made in conjunction with the embodiment of fig. 3, and fig. 14 shows a flowchart of the method of this embodiment. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.

In step 1401, the quantization unit 32 receives the next layer error value x_R. Referring to fig. 2, the backward propagation is to return the error value of the output value back in the direction of the fully-connected layer 204, the pooling layer 203, the convolutional layer 202, and the input layer 201, and for the final output node, the difference between the activation value and the actual value generated by the network is taken as the error value x of the next layer_RError value x of the next layer_RI.e. input data x for the device of fig. 3.

In step 1402, the control unit 31 provides the error value x corresponding to the next layer_RIs a division parameter of_ROffset value shift_RAnd a division value x_pR. It should be noted that the forward segmentation parameter α, the shift value shift and the segmentation value x_pSegmentation parameter α in the reverse direction_ROffset value shift_RAnd a division value x_pRIn contrast, in which the value x is divided_pRIt can be obtained by the calculation of equation (4) as well.

In step 1403, the quantization unit 32 divides the parameter α according to_ROffset value shift_RAnd a division value x_pRQuantizing the next layer error value x_RTo generate error value fixed point data. In more detail, the comparator 322 is based on the division value x_pRDetermining the next layer error value x_RWhether or not it falls within a first interval a defined by the formula (1). The error value x of the next layer_RWithin the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression_R[n-1:0]：

The error value x of the next layer_RIf the data does not fall within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression_R[n-1:0]：

Finally, bus converter 326 complements flag to generate complete setpoint data x_R[n:0]。

In step 1404, the calculation unit 33 sets the data x to the error value_R[n:0]And the fixed point number x of the weight₂[n:0]Performing multiplication to generate weight x₂Of the gradient of (c). Number of fixed points x₂[n:0]Quantization may be performed in the forward propagation flow (step 1305), and in this embodiment, the calculation unit 33 has the structure of the multiplication unit 80 of fig. 8, and the data x is fixed to the error value_R[n:0]And weight fixed point data x₂[n:0]Performing multiplication to generate weight x₂Of the gradient of (c). As mentioned above, the flag bits of the data structure of the fixed-point data of the present disclosure are used to record the error value fixed-point data x_R[n:0]And weight fixed point data x₂[n:0]The sign bit of the interval (c) is used for recording the error value fixed point data x_R[n:0]And weight fixed point data x₂[n:0]The sign and the value of (a) are used for recording the error value fixed point data x_R[n:0]And weight fixed point data x₂[n:0]Value V of_R,V₂. Multiplier 81 sets the error value to the point data x_R[n:0]And weight fixed point data x₂[n:0]To generate a sign-sum value sign_t(ii) a The task of the first selector 821 is equivalent to dividing the parameter α_RAnd the flag bit x of the error value fixed point data_R[n]To generate a first parameter multiplication value pm₁(ii) a The task of the second selector 822 is equivalent to dividing the second division parameter α₂And flag bit x of weight fixed point data₂[n]To generate a second parameter multiplication value pm₂(ii) a Finally, the fixed point to floating point converter 83 performs the following expression to generate the weight x₂Gradient (2):

in step 1405, the quantization unit 32 performs fixed-point computation on the gradient to generate a cost layer error value. The detailed process of performing the fixed-point calculation is as described above, and the description is not repeated.

In step 1406, the control unit 31 adjusts the weight x according to the local layer error value₂. The back propagation algorithm starts from an output layer of the neural network model, and works out the gradient of the model layer by using the chain rule of function derivation so as to adjust the weight x₂。

Another embodiment of the present disclosure is a method for training a neural network, which generally calculates an error value of the neural network by using an output value obtained by forward propagation, and then propagates the error value back to the input terminal, and adjusts a weight of each layer according to the error value of the layer, so that an inference structure of a neural network model is closer to an actual situation. In other words, the method for training a neural network of this embodiment includes the forward propagation process of fig. 13 to obtain the next layer activation value, then calculate the next layer error value according to the next layer activation value, then execute the backward propagation process of fig. 13 to generate the cost layer error value, and then adjust the appropriate weight value according to the current layer error value. And pushing back one path to obtain a proper weight value of each layer.

Fig. 15 is a block diagram illustrating an integrated circuit device 1500 according to an embodiment of the disclosure. As shown in fig. 15, the integrated circuit device 1500 includes a computing device 1502, the computing device 1502 being a neural network computing device in a plurality of the embodiments described above. Additionally, the integrated circuit device 1500 also includes a general interconnect interface 1504 and other processing devices 1506.

The other processing device 1506 may be one or more of general purpose and/or special purpose processors such as a central processing unit, a graphics processing unit, an artificial intelligence processing unit, etc., and the number thereof is not limited but determined according to actual needs. The other processing device 1506 serves as an interface for the computing device 1502 to external data and controls, and performs basic controls including, but not limited to, data transfer, turning on and off the computing device 1502, and the like. Other processing devices 1506 may also cooperate with the computing device 1502 to perform computational tasks.

The universal interconnect interface 1504 may be used to transfer data and control instructions between the computing device 1502 and other processing devices 1506. For example, the computing device 1502 may obtain required input data from the other processing devices 1506 via the universal interconnect interface 1504 and write the input data to memory locations on the computing device 1502. Further, the computing device 1502 may obtain control instructions from the other processing devices 1506 via the universal interconnect interface 1504 to write to a control cache on the computing device 1502. Alternatively or in addition, the universal interconnect interface 1504 may also read data from a memory module of the computing device 1502 and transmit the data to the other processing device 1506.

The integrated circuit device 1500 also includes a storage device 1508 that can be coupled to the computing device 1502 and other processing devices 1506, respectively. The storage device 1508 is used to store data of the computing device 1502 and the other processing device 1506, and is particularly suitable for storing all data that cannot be stored in the internal storage of the computing device 1502 or the other processing device 1506.

According to different application scenarios, the integrated circuit device 1500 can be used as a System On Chip (SOC) for devices such as mobile phones, robots, unmanned aerial vehicles, video acquisition, and the like, thereby effectively reducing the core area of a control part, increasing the processing speed, and reducing the overall power consumption. In this case, the universal interconnect interface 1504 of the integrated circuit device 1500 is connected to certain components of the apparatus. Some of the components herein may be, for example, a camera, a display, a mouse, a keyboard, a network card or a wifi interface.

The present disclosure also discloses a chip or an integrated circuit chip, which includes the integrated circuit device 1500. The present disclosure also discloses a chip package structure including the above chip.

Another embodiment of the present disclosure is a board card including the above chip package structure. Referring to fig. 16, the board 1600 may include other kits including, in addition to the plurality of chips 1602 described above, a memory device 1604, an interface device 1606, and a control device 1608.

The memory device 1604 is coupled to the chip 1602 within the chip package structure via a bus 1616 for storing data. Memory device 1604 may include multiple groups of memory cells 1610.

Interface device 1606 is electrically connected to chip 1602 within the chip package. The interface device 1606 is used for data transmission between the chip 1602 and an external device 1612 (e.g., a server or a computer). In this embodiment, the interface device 1606 is a standard PCIe interface, and the data to be processed is transmitted from the server to the chip 1602 through the standard PCIe interface, so as to implement data transfer. The results of the computations performed by chip 1602 are also transferred back to external device 1612 by interface device 1606.

The control device 1608 is electrically connected to the chip 1602 to monitor the state of the chip 1602. Specifically, the chip 1602 and the control device 1608 may be electrically connected through an SPI interface. The control device 1608 may include a single chip microprocessor ("MCU").

Another embodiment of the present disclosure is an electronic device or apparatus, which includes the board card 1600. According to different application scenarios, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

Another embodiment of the present disclosure is an electronic device comprising one or more processors and a memory, the memory having stored therein computer-executable instructions that, when executed by the one or more processors, cause the electronic device to perform the method as described above, in particular to perform the method as described in fig. 7, 9, 13 and 14.

Another embodiment of the disclosure is a computer-readable storage medium having stored thereon computer-executable instructions for computing data within a computing device, which when executed by one or more processors perform the method as described above, in particular the method as described in fig. 7, 9, 13 and 14.

The present disclosure provides a technical scheme of self-adaptive non-uniform low specific point training. By introducing multiple quantization intervals, a non-uniform fixed point data format and a corresponding hardware technical means, the technical problem that the quantization bit width is not low enough so that the energy consumption cannot be effectively reduced is solved, and the following technical effects are achieved:

1. the memory occupation of the network model is reduced, the resource consumption of model training is reduced, and the training speed is improved.

2. When the neural network model disclosed by the invention is used for reasoning, the reasoning speed can be accelerated, and the precision deterioration can be reduced.

3. When training the neural network model of the present disclosure, the problems of training collapse and non-convergence can be avoided.

The foregoing may be better understood in light of the following clauses:

clause a1, a neural network computing device, comprising: a control unit for providing segmentation parameters; the quantization unit is used for quantizing floating point data according to the segmentation parameters to generate fixed point data; and the calculating unit is used for calculating the neural network by using the fixed point data.

Clause a2, the neural network computing device of clause a1, wherein the control unit comprises a split value generator to generate split values according to the expression:

x_p＝2^αmax{abs(D)}

wherein x is_pFor the segmentation values, α is the segmentation parameter and D is the data distribution.

Clause A3, the neural network computing device of clause a2, wherein the control unit comprises an offset value generator to generate an offset value according to the expression:

wherein x is the floating point data, the ceil function returns the minimum integer greater than or equal to the expression, and b is the quantization bit width.

Clause a4, the neural network computing device of clause A3, wherein the quantization unit comprises: an absolute value operator for taking an absolute value of the floating-point data; and a comparator for comparing the division value with the absolute value, wherein if the division value is greater than the absolute value, a value of a flag bit is set to 1, and if the division value is less than or equal to the absolute value, a value of a flag bit is set to 0.

Clause a5, the neural network computing device of clause a4, wherein the quantifying unit further comprises: a two-way selector to: when the numerical value of the flag bit is 1, setting an output value as the segmentation parameter; and when the value of the flag bit is 0, setting the output value to be 0.

Clause a6, the neural network computing device of clause a5, wherein the quantifying unit further comprises: an adder to add the offset value to the output value to generate a quantized offset value.

Clause a7, the neural network computing device of clause a6, wherein the quantifying unit further comprises: and the quantizer is used for quantizing the floating point data into intermediate data according to the quantization offset value.

Clause A8, the neural network computing device of clause a7, wherein the quantifying unit further comprises: and the bus converter is used for combining the numerical value of the zone bit and the intermediate data to generate the fixed point data.

Clause a9, the neural network computing device of clause A8, wherein the flag is 1 bit, the intermediate data is n bits, and the fixed point data is n +1 bits.

Clause a10, the neural network computing device of clause A3, wherein the partition parameter is a negative integer, the quantization bit width is a positive integer, the relationship of the partition parameter to the quantization bit width is:

50％×b≤abs(α)≤90％×b

clause a11, the neural network computing device of clause a10, wherein the quantization bit width is 8, the partition parameter is one of-4, -5, -6, -7.

Clause a12, the neural network computing device of clause a1, wherein the floating point data is one of a weight, an activation value, and a gradient.

Clause a13, a board comprising the neural network computing device of any one of clauses a 1-12.

Clause a14, a method of computing a neural network, comprising: receiving floating point data that computes the neural network, the floating point data falling within a numerical distribution; dividing the numerical distribution into a first interval and a second interval based on a division parameter; judging whether the floating point data falls within the first interval or not; if so, quantizing the floating-point data according to the segmentation parameters to generate fixed-point data; and calculating the neural network using the setpoint data.

Clause a15, the method of clause a14, wherein the first interval is:

x|x<2^αmax{abs(D)}

wherein x is the floating point data, α is the partition parameter, and D is the data distribution.

Clause a16, the method of clause a15, wherein the quantifying step produces the fixed-point data according to the expression:

interval＝2^shift

wherein x is_qAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.

Clause a17, the method of clause a16, wherein the partition parameter is a negative integer, the quantization bit width is a positive integer, and the relationship between the partition parameter and the quantization bit width is:

50％×b≤abs(α)≤90％×b

clause a18, the method of clause a17, wherein the quantization bit width is 8, the partition parameter is one of-4, -5, -6, -7.

Clause a19, the method of clause a14, wherein the second interval is:

x|x≥2^αmax{abs(D)}

Clause a20, the method of clause a19, wherein when the determining step determines that the floating point data does not fall within the first interval, quantizing the floating point data according to the following expression to produce the fixed point data:

interval＝2^shift

Clause a21, the method of clause a14, wherein the numerical distribution is a gaussian distribution, the first interval being a positively and negatively symmetric interval including 0.

Clause a22, the method of clause a21, wherein the precision of the first interval and the second interval is non-uniform, the partitioning parameter increasing the precision of the first interval.

Clause a23, the method of clause a14, wherein the floating point data is one of a weight, an activation value, and a gradient.

Clause a24, a computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, perform the method of any of clauses a 14-23.

The foregoing detailed description of the embodiments of the present disclosure has been presented for purposes of illustration and description and is intended to be exemplary only and is not intended to be exhaustive or to limit the invention to the precise forms disclosed; meanwhile, for the person skilled in the art, based on the idea of the present disclosure, there may be variations in the specific embodiments and the application scope, and in summary, the present disclosure should not be construed as limiting the present disclosure.

Claims

1. A neural network computing device, comprising:

a control unit for providing segmentation parameters;

the quantization unit is used for quantizing floating point data according to the segmentation parameters to generate fixed point data; and

and the calculating unit is used for calculating the neural network by using the fixed point data.

2. The neural network computing device of claim 1, wherein the control unit includes a split value generator to generate split values according to the following expression:

x_p＝2^αmax{abs(D)}

3. The neural network computing device of claim 2, wherein the control unit includes an offset value generator to generate an offset value according to the following expression:

4. The neural network computing device of claim 3, wherein the quantization unit comprises: an absolute value operator for taking an absolute value of the floating-point data; and

a comparator for comparing the division value with the absolute value, wherein if the division value is greater than the absolute value, a value of a flag bit is set to 1, and if the division value is less than or equal to the absolute value, a value of a flag bit is set to 0.

5. The neural network computing device of claim 4, wherein the quantization unit further comprises:

a two-way selector to:

when the numerical value of the flag bit is 1, setting an output value as the segmentation parameter; and

and when the numerical value of the flag bit is 0, setting the output value to be 0.

6. The neural network computing device of claim 5, wherein the quantization unit further comprises:

an adder to add the offset value to the output value to generate a quantized offset value.

7. The neural network computing device of claim 6, wherein the quantization unit further comprises:

and the quantizer is used for quantizing the floating point data into intermediate data according to the quantization offset value.

8. The neural network computing device of claim 7, wherein the quantization unit further comprises:

and the bus converter is used for combining the numerical value of the zone bit and the intermediate data to generate the fixed point data.

9. The neural network computing device of claim 8, wherein the flag bit is 1 bit, the intermediate data is n bits, and the setpoint data is n +1 bits.

10. The neural network computing device of claim 3, wherein the partition parameter is a negative integer, the quantization bit width is a positive integer, the partition parameter having a relationship to the quantization bit width of:

50％×b≤abs(α)≤90％×b 。

11. the neural network computing device of claim 10, wherein the quantization bit width is 8, the partition parameter is one of-4, -5, -6, -7.

12. The neural network computing device of claim 1, wherein the floating point data is one of a weight, an activation value, and a gradient.

13. A board comprising the neural network computing device of any one of claims 1-12.

14. A method of computing a neural network, comprising:

receiving floating point data that computes the neural network, the floating point data falling within a numerical distribution;

dividing the numerical distribution into a first interval and a second interval based on a division parameter;

judging whether the floating point data falls within the first interval or not;

if so, quantizing the floating-point data according to the segmentation parameters to generate fixed-point data; and

and calculating the neural network by using the fixed point data.

15. The method of claim 14, wherein the first interval is:

x|x<2^αmax{abs(D)}

16. The method of claim 15, wherein the quantizing step generates the setpoint data according to the expression:

interval＝2^shift

17. The method of claim 16, wherein the partition parameter is a negative integer, the quantization bit width is a positive integer, the partition parameter has a relationship to the quantization bit width of:

50％×b≤abs(α)≤90％×b 。

18. the method of claim 17, wherein the quantization bit width is 8, the partition parameter is one of-4, -5, -6, -7.

19. The method of claim 14, wherein the second interval is:

x|x≥2^αmax{abs(D)}

20. The method of claim 19, wherein when said determining step determines that said floating point data does not fall within said first interval, quantizing said floating point data to produce said fixed point data according to the following expression:

interval＝2^shift

21. The method of claim 14, wherein the numerical distribution is a gaussian distribution, the first interval being a positive and negative symmetric interval including 0.

22. The method of claim 21, wherein the precision of the first interval and the second interval is non-uniform, the partitioning parameter increasing the precision of the first interval.

23. The method of claim 14, wherein the floating point data is one of a weight, an activation value, and a gradient.

24. A computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, perform the method of any one of claims 14-23.