[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113723598A - Neural network computing method and device, board card and computer readable storage medium - Google Patents

Neural network computing method and device, board card and computer readable storage medium Download PDF

Info

Publication number
CN113723598A
CN113723598A CN202010457215.1A CN202010457215A CN113723598A CN 113723598 A CN113723598 A CN 113723598A CN 202010457215 A CN202010457215 A CN 202010457215A CN 113723598 A CN113723598 A CN 113723598A
Authority
CN
China
Prior art keywords
interval
point data
quantization
value
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010457215.1A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN202010457215.1A priority Critical patent/CN113723598A/en
Publication of CN113723598A publication Critical patent/CN113723598A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The disclosure relates to a neural network computing method, device, board and computer readable storage medium, wherein the neural network computing device of the disclosure is included in an integrated circuit device, and the integrated circuit device comprises a universal interconnection interface and other processing devices. The neural network computing device interacts with other processing devices to jointly complete computing operation designated by a user. The integrated circuit device may further include a storage device connected to the neural network computing device and the other processing device, respectively, for data storage of the neural network computing device and the other processing device.

Description

Neural network computing method and device, board card and computer readable storage medium
Technical Field
The present disclosure relates generally to the field of neural networks. More particularly, the present disclosure relates to a neural network computing method, apparatus, board, and computer-readable storage medium.
Background
Modern neural networks are continuously increasing the depth, width and resolution of the network in order to process massive data, so that the storage capacity and the calculation amount of the model are increasing, and the neural networks are extremely resource-consuming in training and reasoning, so that a person skilled in the art has no way to reduce energy consumption.
The quantization neural network can compress the model without losing precision, reduce the calculation power consumption and accelerate the model reasoning and training. The quantization is implemented by using low-bit fixed-point numbers instead of floating-point numbers in the network, and the data size can be reduced from the original 32-bit (single-precision floating-point number) to 8-bit or 4-bit. Quantization networks have the advantage that they are already a common technical means in this field.
In training the quantization, the back-propagation gradient must be quantized in addition to the quantization weights and activation values. The backward propagation gradient is computed approximately twice as computationally as the forward propagation. A large number of experiments prove that in the training process of fixed point quantization, the quantization precision requirement of a back propagation gradient is higher than the quantization precision requirement of a forward propagation weight and an activation value, that is, the quantization bit width of the back propagation gradient must be higher than the quantization bit width of the forward propagation weight and the activation value, so that the quantization bit width of the back propagation gradient cannot be made low enough nowadays, generally maintained at 16 bits, the benefit of a compression model is limited, which becomes a main factor limiting the quantization of the fixed point network training process at an edge end, and the training cannot be effectively completed, so that a neural network cannot be applied.
Therefore, a technical problem exists at the present stage, namely, the existing quantification mode cannot effectively reduce energy consumption in the reasoning and training processes.
Disclosure of Invention
In order to at least partially solve the technical problems mentioned in the background, the present disclosure provides a method, an apparatus, a board, and a computer-readable storage medium for neural network computation.
In one aspect, the present disclosure discloses a neural network computing device including a control unit, a quantization unit, and a computation unit. The control unit is used for providing segmentation parameters; the quantization unit is used for quantizing floating point data according to the segmentation parameters to generate fixed point data; the calculating unit is used for calculating the neural network by using the fixed point data.
In another aspect, the present disclosure discloses a multiplication unit including a multiplier, an addition module, and a fixed-point to floating-point converter. The multiplier is used for multiplying the first fixed point data and the second fixed point data to generate a fixed dot product; the addition module is used for adding a plurality of quantization offset coefficients corresponding to the first fixed point data and the second fixed point data to generate a quantization offset coefficient sum; the fixed-point to floating-point converter is used for converting the fixed-point product into floating-point data according to the quantized offset coefficient sum.
In another aspect, the present disclosure discloses an integrated circuit device including the neural network computing device or the multiplication unit, and a board including the integrated circuit device.
In another aspect, the present disclosure discloses a method of computing a neural network, comprising: receiving floating point data that computes the neural network, the floating point data falling within a numerical distribution; dividing the numerical distribution into a first interval and a second interval based on a division parameter; judging whether the floating point data falls within the first interval or not; if so, quantizing the floating-point data according to the segmentation parameters to generate fixed-point data; and calculating the neural network using the setpoint data.
In another aspect, the present disclosure discloses a method of computing a neural network based on floating point data, comprising: presetting a plurality of quantization intervals, wherein each quantization interval corresponds to one quantization formula, and each quantization formula shows different quantized gradients; determining that the floating point data falls within a particular interval of the plurality of quantization intervals; quantizing the floating point data according to a quantization formula corresponding to the specific interval to generate fixed point data; and calculating the neural network using the setpoint data.
In another aspect, the present disclosure discloses a method of computing a neural network based on floating point data, comprising: presetting a plurality of quantization intervals; determining that the floating point data falls within a particular interval of the plurality of quantization intervals; quantizing the floating point data according to the specific interval to generate fixed point data; setting a flag with N bits in a data structure of the fixed point data, wherein the flag bit records the specific interval, and N is a positive integer; and calculating the neural network using the setpoint data.
In another aspect, the present disclosure discloses a method of computing a neural network, comprising: providing a first segmentation parameter and a second segmentation parameter; quantizing first floating point data according to the first segmentation parameter to generate first floating point data; quantizing second floating point data according to the second division parameter to generate second fixed point data; multiplying the first fixed point data and the second fixed point data to generate intermediate data; and calculating the neural network from the intermediate data.
In another aspect, the present disclosure discloses a method of forward propagation in a neural network, comprising: receiving and calculating an activation value and a weight value required by the layer; providing a first segmentation parameter, a first offset value, and a first segmentation value corresponding to the activation value; providing a second segmentation parameter, a second deviation value and a second segmentation value corresponding to the weight; quantizing the activation value according to the first segmentation parameter, the first offset value and the first segmentation value to generate first fixed point data; quantizing the weight according to the second division parameter, the second offset value and the second division value to generate second fixed point data; multiplying the first fixed point data and the second fixed point data to generate intermediate data; performing floating point calculation on the intermediate data to generate an activation value of a next layer; and repeating the steps to execute each layer of calculation so as to complete the neural network.
In another aspect, the present disclosure discloses a method for back propagation in a neural network, the neural network including weights and weight setpoint data after the weight setpoint processing, including: receiving a next layer error value; providing a segmentation parameter, an offset value and a segmentation value corresponding to the next layer error value; quantizing the next layer of error values according to the segmentation parameters, the deviation values and the segmentation values to generate error value fixed point data; multiplying the error value fixed point data and the weight value fixed point data to generate the gradient of the weight value; performing fixed-point calculation on the gradient to generate a cost layer error value; and adjusting the weight value according to the layer error value.
In another aspect, the present disclosure discloses a method of training a neural network, comprising: performing forward propagation; calculating a next layer error value according to the next layer activation value; performing back propagation; and adjusting the weight value according to the layer error value. Wherein the step of performing forward propagation comprises: receiving and calculating an activation value and a weight value required by the layer; quantizing the activation value according to a first segmentation parameter corresponding to the activation value, a first offset value and a first segmentation value to generate first fixed point data; quantizing the weight according to a second division parameter, a second offset value and a second division value corresponding to the weight to generate second fixed point data; multiplying the first fixed point data and the second fixed point data to generate intermediate data; and performing floating point calculation on the intermediate data to generate a next layer of activation value. The step of performing back propagation comprises: quantizing the next-layer error value according to a third segmentation parameter, a third offset value and a third segmentation value corresponding to the next-layer error value to generate third fixed-point data; multiplying the second fixed point data and the third fixed point data to generate the gradient of the weight; a fixed point calculation is performed on the gradient to generate a cost layer error value.
In another aspect, the present disclosure discloses an electronic device comprising one or more processors and a memory. The memory has stored therein computer-executable instructions that, when executed by the one or more processors, cause the electronic device to perform any of the methods as previously described.
In another aspect, the present disclosure discloses a computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, perform any of the methods as previously described.
The present disclosure provides a technical scheme of self-adaptive non-uniform low specific point training. By introducing multiple quantization intervals, a non-uniform fixed point data format and a corresponding hardware technical means, the technical problem that the energy consumption cannot be effectively reduced in the reasoning and training process in the conventional quantization mode is solved, and the technical effects of reducing the memory occupation of a network model, reducing the model training resource consumption and improving the training speed are achieved.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:
FIG. 1 is an input-output diagram illustrating a sigmoid function;
FIG. 2 is a diagram showing a four-layer structure of a neural network;
FIG. 3 is a schematic diagram illustrating a neural network computing device of an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a control unit of an embodiment of the present disclosure;
FIG. 5 is a graph showing a possible numerical distribution of input data for an embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating an 8-bit non-uniform fixed point quantization data structure of an embodiment of the present disclosure;
FIG. 7 is a flow chart illustrating a method of computing a neural network of another embodiment of the present disclosure;
FIG. 8 is a schematic diagram showing a multiplication unit of another embodiment of the present disclosure;
FIG. 9 is a flow diagram illustrating a method of performing a multiplicative computation neural network according to another embodiment of the present disclosure;
FIG. 10 is a schematic diagram illustrating a non-uniform fixed point quantization data structure of another embodiment of the present disclosure;
FIG. 11 is a graph showing a possible numerical distribution of input data for another embodiment of the present disclosure;
FIG. 12 is a schematic diagram illustrating a neural network computing device of another embodiment of the present disclosure;
FIG. 13 is a flow chart illustrating a method of forward propagation of another embodiment of the present disclosure;
FIG. 14 is a flow chart illustrating a method of back propagation of another embodiment of the present disclosure;
FIG. 15 is a block diagram illustrating an integrated circuit device of another embodiment of the present disclosure; and
fig. 16 is a structural diagram showing a board card according to another embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection".
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
Neural networks are built based on the concept of neurons. A neuron is like a sensor, the activation function of a general sensor is a step function, while the activation function of a neuron is commonly a sigmoid function, which is defined as follows:
Figure BDA0002509683830000061
the input-output diagram of the sigmoid function is shown in fig. 1, which can map a real number into an interval of 0 and 1, and is suitable for classification.
A neural network is a plurality of neuron systems connected according to a certain rule. Taking a convolutional neural network as an example, the convolutional neural network is roughly composed of the following four layer structures: an input layer, a convolution layer, a pooling layer, and a fully connected layer. Fig. 2 is a four-layer structure diagram showing a neural network 200.
The input layer 201 intercepts part of information from the input data, converts the part of information into a feature matrix mode, and presents the feature matrix mode, wherein the feature matrix carries features corresponding to the part of information. The input data herein may include, but is not limited to, image data, voice data, or text data.
Convolutional layer 202 is configured to receive the feature matrix from input layer 201 and perform feature extraction on the input data through a convolution operation. In practical applications, the convolutional layer 202 may be constructed as multiple layers of convolutional layers, taking image data as an example, the convolutional layers in the first half are used to capture local and detailed information of an image, for example, each pixel of an output image only senses a small range of values of an input image to perform calculation, and the sensing range of the following convolutional layers is increased layer by layer to capture more complex and abstract information of the image. And finally obtaining abstract representations of the images at different scales through the operation of a plurality of convolution layers. Although the feature extraction of the input image is completed through the convolution operation, the information amount of the feature image is too large, the dimension is too high, not only is the calculation time consuming, but also the overfitting is easily caused, and the dimension needs to be further reduced.
Pooling layer 203 is configured to replace a region of data with a value that is typically the maximum or average of all values in the region. If maximum is used, it is called maximum pooling; if an average is used, it is called mean pooling. By pooling, the model size can be reduced and the computation speed increased without losing too much information.
The fully-connected layer 204 acts as a classifier in the whole convolutional neural network 200, which is equivalent to feature space transformation, extracts and integrates all the useful information, and in addition to the nonlinear mapping of the activation function, the multi-layer fully-connected layer can theoretically simulate any nonlinear transformation, and perform information comparison based on different classifications, thereby determining whether the input data is similar to the compared object.
The neural network needs the weight as the parameter of the model, and the optimal solution is obtained for the weight in the process of training the neural network. In addition, some parameters such as the connection method of the neural network, the number of layers of the network, the number of nodes per layer, and the like are not obtained by learning, but are set in advance. These artificially set parameters are called hyper-parameters.
The neural network is divided into forward propagation and backward propagation, the forward propagation is forward calculation in the direction shown in fig. 2, and the state value and the activation value of each neuron are calculated sequentially from an input layer to an output layer. The back propagation is calculated from the output to the input in reverse in order to find the gradient values.
When input data is computed in a neural network, it is generally desirable to find a solution with a loss function (loss function) that is the minimum, indicating that the result that is closest to the real case is obtained. However, the loss function of the neural network is complex, and it is difficult to find an optimal analytical expression. It is common practice to calculate the negative direction of the gradient direction because the maximum value of the negative direction of the gradient is the direction in which the loss function drops the most, and the back propagation algorithm is to find this gradient value, so as to subsequently update the model parameters based on the gradient drop. Therefore, the back propagation algorithm starts from an output layer of the neural network model, and works out the gradient of the model layer by using a chain rule of function derivation, and a solution with a loss function as the minimum value is expected to be obtained.
By calculating the gradient in this way, each neural unit is calculated only once and is not repeatedly calculated. The fundamental reason that the calculation direction can be efficient is that when the gradient is calculated, the former-stage unit depends on the calculation of the latter-stage unit, the gradient value of the latter-stage unit is calculated firstly, and then the former-stage unit is calculated, so that the calculated result can be fully utilized, and the repeated calculation is avoided.
In neural network training, regular penalties and the ReLU activation function are often used. The regular penalty is that when overfitting, in order to match all data in the test set, a high-order function with poor generalization generates large jitter, the jitter causes a derivative to become large, and large parameters are needed to fit all data, so that the condition with large parameters can be penalized by adding a penalty term to avoid the parameters with large jitter.
The ReLU activation function is that when the model is N layers, the activation rate of the neuron is reduced by 2 theoreticallyNAnd the ReLU can better realize that the sparse model excavates relevant features and fits training data. Furthermore, the ReLU has the following advantages over other activation functions: for a linear function, the expression capacity of the ReLU is stronger, and the ReLU is particularly embodied in a deep network; for the nonlinear function, the gradient of the ReLU in the non-negative interval is constant, so that the problem of gradient disappearance does not exist, and the convergence rate of the model is maintained in a stable state. The gradient vanishing problem means that when the gradient is smaller than 1, the error between the predicted value and the true value is attenuated once per propagation layer, and if the sigmoid function is used as an activation function in a deep layer model, the phenomenon is particularly obvious, so that the convergence of the model is not stopped.
Due to the regular penalty and the effect of the Relu activation function, the weight, the activation value and the gradient of each layer are not uniform in value distribution, and the value distribution of the weight, the activation value and the gradient shows a characteristic of being close to 0 in the whole training process, such as Gaussian-like distribution. Considering that the values are concentrated around 0, the uniform quantization fully quantizes the values close to 0, which seriously affects the direction of network training. For example, if a 7-bit uniform quantity is usedChemical absolute value less than
Figure BDA0002509683830000081
The value of (d) will be quantized to 0. This quantification is intolerable for training. Under the prior art uniform quantization scheme, the factor limiting the accuracy of network training is mainly derived from the quantization error of the gradient, especially from the quantization error of the gradient near 0 in the numerical distribution.
The present disclosure provides an adaptive non-uniform low bit quantization scheme, which is suitable for training tasks and reasoning tasks of various neural networks (such as convolutional neural networks, cyclic neural networks, graph neural networks, etc.). As mentioned above, in the process of improving the neural network, two types of parameters are involved, one is a general parameter, i.e. parameter data obtained by training, such as weight w and offset b in y ═ wx + b; the other is called a hyper-parameter, which is parameter data that cannot be obtained by training, and parameters are set before learning is started.
The apparatus in the embodiments of the present disclosure, and the various devices, units, modules, and the like described below may be implemented in the form of hardware circuits, such as digital circuits or analog circuits. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor mentioned in the embodiments may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory, Memory device, and storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-bandwidth Memory HBM (High-bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and the like.
Alternatively, when the following devices, various devices, units, and modules are implemented in an ASIC, the advantages of the implementation over other hardware are power consumption, reliability, area, and the like, and especially, the implementation is applied to a high-performance and low-power mobile terminal.
One embodiment of the present disclosure is a neural network computing device that divides a numerical distribution of floating point input data into a first interval and a second interval, where different quantization modes are used for floating point data falling in different intervals. A schematic diagram of this embodiment is shown in fig. 3.
The neural network computing device of this embodiment includes a control unit 31, a quantization unit 32, and a calculation unit 33. The control unit 31 is used for providing various parameters and hyper-parameters required in the quantization process; the quantization unit 32 quantizes floating point data according to the parameter and the hyper-parameter to generate fixed point data; the calculation unit 33 is used for calculating the neural network by using the fixed point data. The units of fig. 3 may be implemented in hardware circuits.
The control unit 31 is used for providing the dividing parameter α, the offset shift and the dividing value x required in the quantization processpAnd the like. As shown in fig. 4, the control unit 31 includes an input/output module 311, a partition parameter generator 312, an offset value generator 313, and a partition value generator 314.
The input/output module 311 serves as a channel for signal transmission between the control unit 31 and the external unit, and can output a signal when a control request occurs, and at the same time, can receive a control signal from the external unit, and send the control signal to the division parameter generator 312, the offset value generator 313, or the division value generator 314.
The partition parameter generator 312 is used to generate the hyper-parameters: the segmentation parameter alpha. The division parameter α is used to define the first interval and the second interval. If the input data x (floating point number) is distributed in the numerical distribution D, the relation between the partition parameter α and the first and second intervals a and B is:
A={x|x<2αmax{abs(D)},x∈D} (1)
B={x|x≥2αmax{abs(D)},x∈D} (2)
referring to FIG. 5, the Gaussian distribution curve represents the possible value distribution of the input data x, the input data x substantially completely falls within the value distribution D, and the first interval A isA positive-negative symmetrical section including 0, i.e., a range between two imaginary lines in the drawing, and a range outside the imaginary lines is a second section B. When the input data x is less than 2αmax { abs (d) }, which indicates that the input data x falls within the first interval a; when the input data x is greater than or equal to 2αmax { abs (d) }, indicates that the input data x falls within the second interval B.
The division parameter α determines the position of the broken line in fig. 5, that is, determines the range of the first section a, and the smaller the absolute value of the division parameter α, the larger the range of the first section a. The partition parameter α is related to a quantization bit width b, which represents the number of bits of the fixed point number, where the quantization bit width b is a positive integer and the partition parameter α is a negative integer. The relationship between the two in this embodiment is as follows:
50%×b≤abs(α)≤90%×b
taking the quantization bit width b as 8 (i.e. a fixed point number of 8 bits) as an example, the partition parameter α is optionally one of-4, -5, -6, and-7.
The offset value generator 313 is configured to generate an offset value shift, which represents a quantization offset, and the numerical value generator 313 generates the offset value shift according to the following expression:
Figure BDA0002509683830000101
wherein the ceil function returns the smallest integer greater than or equal to the expression.
The segmentation value generator 314 is used to generate a segmentation value xpDividing value xpA boundary line on the abscissa of fig. 5 between the first section a and the second section B is shown, that is, an abscissa value corresponding to the broken line in fig. 4. The value generator 314 generates the division value x according to the following expressionp
xp=2αmax{abs(D)} (4)
The division parameter α, the shift value shift and the division value xpThe parameters are parameters required in the quantization process, and the parameters generated by the control unit 31 are sent to the quantization unit 32 through the input/output module 311.
Optionally, in an embodiment, the control unit 31 may also have other hardware circuit implementations, which are not described herein again.
Returning to fig. 3, the quantization unit 32 includes: absolute value operator 321, comparator 322, two-way selector 323, adder 324, quantizer 325 and bus converter 326.
The absolute value operator 321 receives the input data x, takes the absolute value of the input data x in the floating-point format, and outputs an absolute value abs (x). The comparator 322 receives the absolute value abs (x) and the division value xpThe absolute value abs (x) is compared with the division value xpA comparison is made. If the value x is dividedpGreater than the absolute value abs (x), indicating that the input data x falls within the first interval a, with reference to equation (1), the comparator 322 sets the flag bit flag to a value of 1 if the value x is dividedpAnd is less than or equal to the absolute value abs (x), and represents that the input data x falls within the second interval B with reference to equation (2), and the value of the flag is set to 0. The value of flag indicates that the input data x falls in the first interval a or the second interval B.
The two-way selector 323 receives the division parameter α from the control unit 31, and determines whether the output is the division parameter α or 0 based on the value of the flag bit flag. When the value of the flag is 1, it indicates that the input data x falls within the first interval a, and the value in the first interval a including 0 is easily quantized to 0. When the flag value is 0, it indicates that the input data x falls within the second interval B, and the value in this interval does not require precision adjustment, so the two-way selector 323 sets the output value to 0.
The adder 324 receives the output of the two-way selector 323 and the offset value shift from the control unit 31, and adds the offset value shift to the output value of the two-way selector 323 to generate a quantization offset value s. That is, when the flag value is 1, the quantization offset value s is the division parameter α plus the offset value shift; when the value of flag is 0, quantization offset value s is offset value shift.
Quantizer 325 quantizes the input data x into n-bit fixed point data, which is quantized by:
Figure BDA0002509683830000111
therein, 2sCalled the quantization interval (interval), the round function is the result of a rounding operation, x [ n-1:0]]Fixed point data quantized for input data x, but with the output x [ n-1:0] of quantizer 325]Only intermediate data, not the final quantization result.
The bus converter 326 is used to combine the flag value with the intermediate data x [ n-1:0] to generate the fixed-point data x [ n:0], and more specifically, to add the flag value to the intermediate data x [ n-1:0] so that the final fixed-point data x [ n:0] is n +1 bits.
In another scenario, if the flag value is not important to the computing unit 33, the embodiment may not include the bus converter 326, and the output x [ n-1:0] of the quantizer 325 is the final quantization result and is directly transmitted to the computing unit 33.
Optionally, in an embodiment, the quantization unit 32 may also have other hardware circuit implementations, which are not described herein again.
This embodiment is combined with a new non-uniform fixed point quantization data structure, and the data format of the non-uniform fixed point data x [ n:0] will be described below. Fig. 6 shows an 8-bit non-uniform fixed point quantization data structure, which includes a sign bit 61, a value bit 62, and a flag bit 63. The sign bit 61 is 1 bit, and is the Most Significant Bit (MSB) for recording the sign of the fixed-point data. The flag bit 63 is 1 bit and is the Least Significant Bit (LSB) and is used to record the value of the flag bit flag. The middle 6 bits are the value bits 62 for recording the value of the fixed-point data. The output x [ n-1:0] of the quantizer 325 corresponds to the sign bit 61 and the value bit 62 of the non-uniform fixed point quantization data structure, and the bus converter 326 records the value of the flag bit flag in the flag bit 63 to generate the complete fixed point data x [ n:0 ]. Based on the non-uniform fixed point quantization data structure of fig. 6, the relationship between the corresponding floating point input data x and the sign bit 61, the value bit 62 and the flag bit 63 is:
x=(-1)sign×value×2α·flag×interval (6)
where sign is the value of sign bit 61 and value is the value of value bit 62.
If the fixed-point number is not 8 bits, but n, then under this fixed-point data structure, its most significant and least significant bits are again sign bit 61 and flag bit 63, and the middle n-2 bits are value bits 62.
Returning to FIG. 3, in the quantization unit 32, the floating-point input data x is converted to fixed-point data x [ n:0], which is sent to the calculation unit 33 for calculation. The calculation unit 33 may perform a specific calculation according to actual requirements, for example, convolution calculation of the full connection layer 204 in fig. 2, after the calculation is completed, an intermediate result y [ n:0] is generated, the intermediate result y [ n:0] is also the fixed point data, and the calculation unit 33 reduces the intermediate result y [ n:0] to the floating point data y, so as to complete the whole calculation process.
The foregoing description uses the input data x as the activation value, but the disclosure is not limited thereto, that is, in the neural network, any data that needs to be quantized (such as weight and gradient, etc.) can be converted into fixed point data by the quantization unit 32.
In summary, this embodiment implements a method of computing a neural network, and fig. 7 shows a flowchart of this method. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.
In step 71, the absolute value operator 321 receives floating point data of the computational neural network, which falls within the numerical distribution. Referring to fig. 3, the absolute value operator 321 receives input data x, takes an absolute value of the input data x in a floating-point format, and outputs an absolute value abs (x). It is particularly emphasized that in this embodiment, a series of floating-point data is received, each of which is processed according to the flow of FIG. 7.
In step 72, the control unit 31 divides the numerical distribution into a first section and a second section based on the division parameter. Referring to fig. 4, the division parameter generator 312 of the control unit 31 generates the division parameter α. The division parameter α is a first interval a and a second interval B defined by equations (1) and (2).
In step 73, comparator 322 determines whether the floating point data falls within the first interval. Referring to FIG. 3, the comparator 322 receives the absolute value abs (x) and the division value xpThe absolute value abs (x) is compared with the division value xpA comparison is made. If the value x is dividedpGreater than the absolute value abs (x), indicating that the input data x falls within the first interval a, the comparator 322 sets the flag bit flag to a value of 1 if the division value x is greater than the absolute value abs (x)pAnd the absolute value is less than or equal to abs (x), the input data x fall in the second interval B, and the value of the flag bit flag is set to be 0.
If the floating point data falls within the first interval, step 74 is performed, and the two-way selector 323, the adder 324, the quantizer 325 and the bus converter 326 quantize the floating point data according to the partition parameter to generate the fixed point data. Referring to fig. 3, the two-way selector 323 receives the division parameter α from the control unit 31, and decides to output the division parameter α or 0 based on the value of the flag bit flag. Since the floating-point data falls within the first interval, the flag has a value of 1, and the two-way selector 323 sets the output value as the division parameter α. The quantization offset value s output from the adder 324 is the division parameter α plus the offset value shift. Quantizer 325 quantizes input data x into intermediate data x based on equation (5)qThe bus converter 326 then adds the value of flag to the intermediate data xqTo generate fixed point data x [ n:0]]。
If the floating point data does not fall within the first interval, step 75 is performed, the two-way selector 323. The adder 324, the quantizer 325, and the bus converter 326 quantize floating point data according to the following expression, instead of quantizing floating point data according to the partition parameter, to generate fixed point data xq
Figure BDA0002509683830000141
interval=2shift (8)
Figure BDA0002509683830000142
These expressions are not substantially different from expressions (3) and (5), except that the floating point data falls within the second interval B, so the flag value is 0, and the two-way selector 323 sets the output value to 0. The quantization offset value s output by the adder 324 is only the offset value shift. Quantizer 325 quantizes input data x into intermediate data xqThe bus converter 326 adds the flag value flag to the intermediate data xqTo generate fixed point data x [ n:0]]。
In step 76, the calculation unit 33 calculates the neural network using the setpoint data x [ n:0 ]. Referring to FIG. 3, the calculating unit 33 can perform specific calculation according to actual needs, and generate an intermediate result y [ n:0] after the calculation is completed, the intermediate result y [ n:0] is also fixed point data, and the calculating unit 33 restores the intermediate result y [ n:0] to floating point data y, so as to complete the whole calculation process.
This embodiment is an "adaptive" system because the comparator 322 can generate the flag bit flag to record that the input data x falls in the first interval a or the second interval B, so that the quantization unit 32 can select a proper precision for quantization. Furthermore, the first interval A and the second interval B have different precisions, and the first interval A including 0 utilizes 2αThe precision is sliced finer to avoid a large amount of data close to 0 being quantized to 0, so this embodiment is also a "non-uniform" system.
For the neural network, the calculating unit 33 of the foregoing embodiment can perform an important calculation, namely, matrix multiplication, where the matrix multiplication involves multiplication of a large number of fixed-point numbers, and the present disclosure proposes a multiplication unit with a special structure based on the fixed-point data structure of fig. 6.
When multiplying two fixed-point data having the fixed-point data structure of fig. 6, it is assumed that the first fixed-point data x1Falling within the first numerical distribution, the second fixed point data x2Falling within the second numerical distribution, the first segmentation parameter α1For dividing the first value distribution into a first interval and a second interval, a second division parameter alpha2For dividing the second numerical distribution into a third interval and a fourth interval, a first flag value flag1For reflecting the first fixed point data x1A second flag value flag falling in the first interval or the second interval2To reflect the second fixed point data x2Falls within the third interval or the fourth interval. Based on the formula (6), the floating dot product x1×x2Comprises the following steps:
Figure BDA0002509683830000151
another embodiment of the present disclosure is a multiplication unit, which is schematically shown in fig. 8, for implementing the calculation of equation (10). The multiplication unit 80 of this embodiment includes a multiplier 81, an addition module 82, and a fixed-point to floating-point converter 83.
The multiplier 81 is used for the first fixed point data x1[n-1:0]And second fixed point data x2[n-1:0]Multiplying to generate a fixed dot product y 2n-1:0]That is to say of the implementation of the formula (10)
Figure BDA0002509683830000152
Figure BDA0002509683830000153
First fixed point data x1[n-1:0]And second fixed point data x2[n-1:0]The lowest order flag bit flag in the data structure may be removed from the output of the quantizer 325 of fig. 3 or from the output of the bus converter 326, leaving a sign bit and a value bit. Due to the first fixed point data x1[n-1:0]And second fixed point data x2[n-1:0]Are all n bits, and the fixed-point product y [2n-1:0]Is 2n bits.
The adding module 82 is used for corresponding a plurality of data x to the first fixed point1[n-1:0]And second fixed point data x2[n-1:0]To generate a quantized offset coefficient sum, i.e. implementing equation (10)
Figure BDA0002509683830000154
As shown in formula (8), of formula (10)
Figure BDA0002509683830000155
Is equal to
Figure BDA0002509683830000156
The quantized offset coefficients thus relate to the partition parameter α, flag value flag, and offset value shift.
The addition module 82 includes a first selector 821, a second selector 822, and an adder 823. The first selector 821 according to the first flag value flag1Setting a first output value as a first division parameter alpha1Or 0. When the first flag value flag 11, i.e. flag bit x1[n]When the numerical value of (A) is 1, it indicates the first fixed point data x1[n-1:0]Falling in a first interval, a first segmentation parameter α1Needs to participate in the calculation, and therefore outputs a first segmentation parameter alpha1(ii) a When the first flag value flag1When 0, i.e. flag bit x1[n]When the numerical value of (A) is 0, it indicates the first fixed point data x1[n-1:0]Falling in a second interval, a first division parameter α1Does not participate in the calculation, and therefore outputs 0. Likewise, the second selector 822 responds to a second flag value flag2Setting the second output value as the second division parameter alpha2Or 0, the specific operation of which is the same as that of the first selector 821 and will not be described again.
Adder 823 shifts the first offset value1Second offset value shift2The first output value and the second output value are added to generate a quantized offset coefficient sum s, i.e., α1·flag12·flag2+shift1+shift2. Wherein the first offset value shift1And second offset shift2Is calculated according to equation (3).
The fixed point to floating point converter 83 converts the fixed point product y [2n-1: 0] according to the quantized offset coefficient sum s]Converted to floating point data y. The fixed-point to floating-point converter 83 includes a power of 2 calculator 831 and a multiplier 832. The power of 2 calculator 831 generates a quantization offset value of 2 based on the quantization offset coefficient and s as powerssTo realize
Figure BDA0002509683830000161
Multiplier 832 will fix the dot product y 2n-1:0]And quantization offset value 2sMultiply to obtain floating dot product x1×x2
Optionally, in an embodiment, the multiplication unit 80 may also have other hardware circuit implementations, which are not described herein again.
In summary, this embodiment implements a method for performing multiplicative neural networks, and the flow chart of this method is shown in fig. 9. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.
In step 91, the control unit 31 provides a first segmentation parameter α1And a second division parameter alpha2. In more detail, the segmentation parameter generator 312 of the control unit 31 generates the first segmentation parameter α1And a second division parameter alpha2
In step 92, the quantization unit 32 divides the parameter α according to the first division parameter α1Quantizing first floating-point data x1To generate a first fixed point data x1[n:0]. Wherein the comparator 322 is based on the first segmentation parameter α1And generating a division value x according to equation (4)pDividing the first numerical distribution into a first interval A and a second interval B; then, the first floating-point data x is judged1Whether or not it falls within a first interval a defined by the formula (1). Such as the first floating-point data x1Within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression1[n-1:0]:
Figure BDA0002509683830000171
Figure BDA0002509683830000172
Figure BDA0002509683830000173
Such as the first floating-point data x1If the data does not fall within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression1[n-1:0]:
Figure BDA0002509683830000174
Figure BDA0002509683830000175
Figure BDA0002509683830000176
Finally, the bus converter 326 will flag the bitflag complement to generate complete first fixed point data x1[n:0]。
In step 93, the quantization unit 32 divides the parameter α according to the second division parameter α2Quantizing the second floating-point data x2To generate second fixed point data x2[n:0]. Likewise, comparator 322 is based on a second segmentation parameter α2And generating a division value x according to equation (4)pDividing the second numerical value distribution into a third interval and a fourth interval; then, the second floating-point data x is judged2Whether or not it falls within a third interval defined by the formula (1). E.g. second floating point data x2In the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the second fixed-point data x according to the following expression2[n-1:0]:
Figure BDA0002509683830000177
Figure BDA0002509683830000178
Figure BDA0002509683830000181
E.g. second floating point data x2If the difference does not fall within the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression2[n-1:0]:
Figure BDA0002509683830000182
Figure BDA0002509683830000183
Figure BDA0002509683830000184
Finally, the bus converter 326 complements the flag bit flag to generate the second fixed-point data x2[n:0]。
In step 94, the multiplication unit 80 performs multiplication on the first fixed point data x1[n:0]And second fixed point data x2[n:0]A multiplication operation is performed to generate intermediate data y. As mentioned above, the flag bit of the data structure of the fixed-point data of the present disclosure is used to record the first fixed-point data x1[n:0]And second fixed point data x2[n:0]The sign bit of the interval (c) is used for recording the first fixed point data x1[n:0]And second fixed point data x2[n:0]The sign of the value bit is used for recording the first fixed point data x1[n:0]And second fixed point data x2[n:0]Value V of1,V2. The multiplier 81 multiplies the first fixed point data x1[n:0]And second fixed point data x2[n:0]To generate a sign-sum value signt(ii) a The task of the first selector 821 is equivalent to dividing the first division parameter α1And a flag bit x of the first fixed point data1[n]To generate a first parameter multiplication value pm1(ii) a The task of the second selector 822 is equivalent to dividing the second division parameter α2And the flag bit x of the second fixed point data2[n]To generate a second parameter multiplication value pm2(ii) a Finally, the fixed-point to floating-point converter 83 executes the following expression to generate the intermediate data:
Figure BDA0002509683830000185
in step 95, a neural network is calculated from the intermediate data. In the neural network architecture shown in fig. 2, the input data may be various, such as image, voice, text data, etc., and the data undergoes a large number of multiplications in the input layer 201, the convolutional layer 202, the pooling layer 203, and the fully-connected layer 204, and all of the multiplications can be implemented by the aforementioned steps until the inference process is completed and the image, voice, text data are finally recognized. In addition to the input data, the parameters in the neural network can be converted into fixed point data by the steps and then multiplied by the input data.
Based on the fact that data close to 0 is quantized to 0 during the quantization process, which affects the calculation, the above embodiments divide the value distribution of the input data into two values: one is a positive-negative symmetric section including 0, and the other is a section other than the above. However, the present disclosure does not limit the number of intervals, and may be appropriately divided into a plurality of intervals as long as the numerical value distribution in different ranges needs to be quantified with different accuracy.
When the interval exceeds two, the data structure in fig. 6 only needs to adjust the size of the flag bit 63. Taking 3 or 4 intervals as an example, in order to completely record data falling in one of the 3 or 4 intervals, the flag bit 63 needs 2 bits, as shown by the flag bit 10 in fig. 10. In other words, if the numerical distribution is divided into N intervals, the flag bit requires ceil [ log ]2N]A bit.
Another embodiment of the present disclosure is a neural network computing device, different from the foregoing embodiments, in that the neural network computing device of this embodiment divides the numerical value distribution of floating point data into a first interval, a second interval and a third interval, and different quantization modes are adopted when the floating point data falls into different intervals. As shown in fig. 11, the numerical distribution D is divided into a first section a, a second section B, and a third section C according to the quantization accuracy.
This embodiment defines the 3 intervals with 2 partition parameters, partition parameter α1For defining a first interval A and a second interval B, a partition parameter alpha2For defining a second interval B and a third interval C. Segmentation parameter alpha1、α2The relationship with the first interval a, the second interval B and the third interval C is:
Figure BDA0002509683830000191
Figure BDA0002509683830000192
Figure BDA0002509683830000193
in calculating the fixed point data x falling in the first section AqThe following expression may be employed:
Figure BDA0002509683830000194
Figure BDA0002509683830000195
Figure BDA0002509683830000201
in calculating the fixed point data x falling in the second section BqThe following expression may be employed:
Figure BDA0002509683830000202
Figure BDA0002509683830000203
Figure BDA0002509683830000204
in calculating the fixed point data x falling in the third section CqThe following expression may be employed:
Figure BDA0002509683830000205
interval3=2shift
Figure BDA0002509683830000206
fig. 12 is a schematic diagram of this embodiment, which is not significantly different from the framework of fig. 3, and only differs from the control unit 121, the comparator 122 and the three-way selector 123.
Compared to the control unit 31 of fig. 3, the control unit 121 outputs the segmentation parameter α1、α2To the three-way selector 123 and outputs a signal corresponding to the division parameter α1First division value x ofp1And corresponds to the segmentation parameter α2Second division value x ofp2To the comparator 122, its expression is as follows:
Figure BDA0002509683830000207
Figure BDA0002509683830000208
compared to the comparator 322 of fig. 3, the comparator 122 is implemented with a two-stage comparison circuit. The first stage compares the absolute value abs (x) of the input data x with a first division value xp1If the absolute value abs (x) is smaller than the first division value xp1Indicates that the input data x falls in the first interval A, so it is not necessary to match the second division value xp2Comparing, outputting flag value flag as 00, if absolute value abs (x) is not less than first division value xp1When the input data x falls in the second interval B or the third interval C, the input data x enters the second stage circuit, and the absolute value abs (x) of the input data x is compared with the second division value xp2. If the absolute value abs (x) is smaller than the second segmentation value xp2Indicates that the input data x falls in the second interval B, so the output flag value flag is 01, if the absolute value abs (x) is not less than the second division value xp2Since the input data x falls in the third section C, the output flag value flag is 10.
Compared to the two-way selector 323 of fig. 3, the three-way selector 123 receives the division parameter α1、α2And determines the output as a division parameter alpha based on the value of a flag bit flag1、α2Or 0. When the flag value is 00, it indicates that the input data x falls within the first interval a, so the three-way selector 123 sets the output value as the division parameter α1. When the flag value is 01, it indicates that the input data x falls within the second interval B, so the three-way selector 123 sets the output value as the division parameter α 2. When the flag value is 10, it indicates that the input data x falls within the third interval C, so the three-way selector 123 sets the output value to 0.
The operation of the other elements is the same as that of the corresponding elements in FIG. 3, and therefore, the description thereof is omitted. Optionally, in an embodiment, the embodiment of fig. 12 may also have other hardware circuit implementation manners, which are not described herein again.
The embodiment of fig. 12 is illustrated with 3 intervals, the disclosure is not limited to the number of intervals, and those skilled in the art can easily deduce a plurality of intervals without creative investment.
Another embodiment of the present disclosure is a method of forward propagation in a neural network, that is, an inference process of the neural network, which may be implemented using the apparatus of fig. 3 or fig. 12. For convenience of explanation, the following description will be made in conjunction with the embodiment of fig. 3, and fig. 13 shows a flowchart of the method of this embodiment. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.
In step 1301, the absolute value is runThe calculator 321 receives the activation value x required for calculating the layer1And weight x2. Activation value x in floating point number format1And weight x2The input data is input to the absolute value operator 321, and the absolute value abs (x) is output.
In step 1302, the control unit 31 provides the corresponding activation value x1First segmentation parameter α of1First shift value shift1And a first division value xp1. First segmentation parameter α1First shift value shift1And a first division value xp1All of which are described in the foregoing embodiments and will not be described again. Wherein the first division value xp1Can be obtained by calculation of equation (4).
In step 1303, the control unit 31 provides the weight x2Second division parameter α of2Second offset value shift2And a second division value xp2. Wherein the second division value xp2Can be obtained by calculation of equation (4).
In step 1304, the quantization unit 32 performs quantization on the basis of the first segmentation parameter α1First shift value shift1And a first division value xp1Quantizing the activation value x1To generate first fixed point data. Comparator 322 bases on the first division value xp1Judging the activation value x1Whether or not it falls within a first interval a defined by the formula (1). Such as an activation value x1Within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression1[n-1:0]:
Figure BDA0002509683830000221
Figure BDA0002509683830000222
Figure BDA0002509683830000223
Such as an activation value x1If the data does not fall within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression1[n-1:0]:
Figure BDA0002509683830000224
Figure BDA0002509683830000225
Figure BDA0002509683830000226
Finally, the bus converter 326 complements the flag bit flag to generate the complete first fixed point data x1[n:0]。
In step 1305, the quantization unit 32 performs quantization based on the second division parameter α2Second offset value shift2And a second division value xp2Quantized weight x2To generate second fixed-point data. In more detail, the weight x2The second distribution of values falls within a second distribution of values, which is divided into a third interval and a fourth interval. Comparator 322 bases on the second division value xp2Judging the weight x2Whether it falls within a third interval, which is also defined by equation (1). Such as weight x2Falling within the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression2[n-1:0]:
Figure BDA0002509683830000231
Figure BDA0002509683830000232
Figure BDA0002509683830000233
Such as weight x2If the difference does not fall within the third interval, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expression2[n-1:0]:
Figure BDA0002509683830000234
Figure BDA0002509683830000235
Figure BDA0002509683830000236
Finally, the bus converter 326 complements the flag bit flag to generate the second fixed-point data x2[n:0]。
In step 1306, the calculation unit 33 performs calculation on the first fixed point data x1[n:0]And the number x of second fixed points2[n:0]The multiplication is performed to generate intermediate data. In this embodiment, the calculation unit 33 has the structure of the multiplication unit 80 of fig. 8, and for the first fixed point data x1[n:0]And second fixed point data x2[n:0]A multiplication operation is performed to generate intermediate data y. As mentioned above, the flag bit of the data structure of the fixed-point data of the present disclosure is used to record the first fixed-point data x1[n:0]And second fixed point data x2[n:0]The sign bit of the interval (c) is used for recording the first fixed point data x1[n:0]And second fixed point data x2[n:0]The sign of the value bit is used for recording the first fixed point data x1[n:0]And second fixed point data x2[n:0]Value V of1,V2. The multiplier 81 multiplies the first fixed point data x1[n:0]And second fixed point data x2[n:0]To generate a sign-sum value signt(ii) a The task of the first selector 821 is equivalent to dividing the first division parameter α1And a flag bit x of the first fixed point data1[n]To generate a first parameter multiplication value pm1(ii) a The task of the second selector 822 is equivalent to dividing the second division parameter α2And the flag bit x of the second fixed point data2[n]To generate a second parameter multiplication value pm2(ii) a Finally, the fixed-point to floating-point converter 83 executes the following expression to generate intermediate data:
Figure BDA0002509683830000241
the intermediate data is the output result of the current layer and is also the input data of the next layer, i.e. the activation value. The method returns to step 1301, the activation value of the next layer obtained in the step is input to the next layer, and the execution is repeated until all layers are completely calculated.
In step 1307, neural network reasoning is completed. In the neural network architecture shown in fig. 2, the input data may be various, such as image, voice, text data, etc., and these data are repeatedly executed in the input layer 201, the convolutional layer 202, the pooling layer 203, and the fully-connected layer 204 by multiplying the foregoing steps several times until the inference process is completed, and these image, voice, text data are finally recognized.
Another embodiment of the present disclosure is a method of backpropagating in a neural network by propagating error values to quantify the backpropagation gradient, which may also be implemented using the hardware of fig. 3 or fig. 12. For convenience of explanation, the following description will be made in conjunction with the embodiment of fig. 3, and fig. 14 shows a flowchart of the method of this embodiment. The flow of this embodiment is described based on the foregoing hardware design, and it is understood that the invention is not limited to the hardware design of the present disclosure, such as digital circuits or analog circuits. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, etc., and the artificial intelligence processor mentioned in the embodiments may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, etc. The memory, storage device, and storage unit may be any suitable magnetic or magneto-optical storage medium, such as resistive random access memory RRAM, dynamic random access memory DRAM, static random access memory SRAM, enhanced dynamic random access memory EDRAM, high bandwidth memory HBM, hybrid storage cubic HMC, and the like, unless otherwise specified.
In step 1401, the quantization unit 32 receives the next layer error value xR. Referring to fig. 2, the backward propagation is to return the error value of the output value back in the direction of the fully-connected layer 204, the pooling layer 203, the convolutional layer 202, and the input layer 201, and for the final output node, the difference between the activation value and the actual value generated by the network is taken as the error value x of the next layerRError value x of the next layerRI.e. input data x for the device of fig. 3.
In step 1402, the control unit 31 provides the error value x corresponding to the next layerRIs a division parameter ofROffset value shiftRAnd a division value xpR. It should be noted that the forward segmentation parameter α, the shift value shift and the segmentation value xpSegmentation parameter α in the reverse directionROffset value shiftRAnd a division value xpRIn contrast, in which the value x is dividedpRIt can be obtained by the calculation of equation (4) as well.
In step 1403, the quantization unit 32 divides the parameter α according toROffset value shiftRAnd a division value xpRQuantizing the next layer error value xRTo generate error value fixed point data. In more detail, the comparator 322 is based on the division value xpRDetermining the next layer error value xRWhether or not it falls within a first interval a defined by the formula (1). The error value x of the next layerRWithin the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expressionR[n-1:0]:
Figure BDA0002509683830000251
Figure BDA0002509683830000252
Figure BDA0002509683830000253
The error value x of the next layerRIf the data does not fall within the first interval A, the two-way selector 323, the adder 324 and the quantizer 325 generate the intermediate fixed-point data x according to the following expressionR[n-1:0]:
Figure BDA0002509683830000254
Figure BDA0002509683830000255
Figure BDA0002509683830000256
Finally, bus converter 326 complements flag to generate complete setpoint data xR[n:0]。
In step 1404, the calculation unit 33 sets the data x to the error valueR[n:0]And the fixed point number x of the weight2[n:0]Performing multiplication to generate weight x2Of the gradient of (c). Number of fixed points x2[n:0]Quantization may be performed in the forward propagation flow (step 1305), and in this embodiment, the calculation unit 33 has the structure of the multiplication unit 80 of fig. 8, and the data x is fixed to the error valueR[n:0]And weight fixed point data x2[n:0]Performing multiplication to generate weight x2Of the gradient of (c). As mentioned above, the flag bits of the data structure of the fixed-point data of the present disclosure are used to record the error value fixed-point data xR[n:0]And weight fixed point data x2[n:0]The sign bit of the interval (c) is used for recording the error value fixed point data xR[n:0]And weight fixed point data x2[n:0]The sign and the value of (a) are used for recording the error value fixed point data xR[n:0]And weight fixed point data x2[n:0]Value V ofR,V2. Multiplier 81 sets the error value to the point data xR[n:0]And weight fixed point data x2[n:0]To generate a sign-sum value signt(ii) a The task of the first selector 821 is equivalent to dividing the parameter αRAnd the flag bit x of the error value fixed point dataR[n]To generate a first parameter multiplication value pm1(ii) a The task of the second selector 822 is equivalent to dividing the second division parameter α2And flag bit x of weight fixed point data2[n]To generate a second parameter multiplication value pm2(ii) a Finally, the fixed point to floating point converter 83 performs the following expression to generate the weight x2Gradient (2):
Figure BDA0002509683830000261
in step 1405, the quantization unit 32 performs fixed-point computation on the gradient to generate a cost layer error value. The detailed process of performing the fixed-point calculation is as described above, and the description is not repeated.
In step 1406, the control unit 31 adjusts the weight x according to the local layer error value2. The back propagation algorithm starts from an output layer of the neural network model, and works out the gradient of the model layer by using the chain rule of function derivation so as to adjust the weight x2
Another embodiment of the present disclosure is a method for training a neural network, which generally calculates an error value of the neural network by using an output value obtained by forward propagation, and then propagates the error value back to the input terminal, and adjusts a weight of each layer according to the error value of the layer, so that an inference structure of a neural network model is closer to an actual situation. In other words, the method for training a neural network of this embodiment includes the forward propagation process of fig. 13 to obtain the next layer activation value, then calculate the next layer error value according to the next layer activation value, then execute the backward propagation process of fig. 13 to generate the cost layer error value, and then adjust the appropriate weight value according to the current layer error value. And pushing back one path to obtain a proper weight value of each layer.
Fig. 15 is a block diagram illustrating an integrated circuit device 1500 according to an embodiment of the disclosure. As shown in fig. 15, the integrated circuit device 1500 includes a computing device 1502, the computing device 1502 being a neural network computing device in a plurality of the embodiments described above. Additionally, the integrated circuit device 1500 also includes a general interconnect interface 1504 and other processing devices 1506.
The other processing device 1506 may be one or more of general purpose and/or special purpose processors such as a central processing unit, a graphics processing unit, an artificial intelligence processing unit, etc., and the number thereof is not limited but determined according to actual needs. The other processing device 1506 serves as an interface for the computing device 1502 to external data and controls, and performs basic controls including, but not limited to, data transfer, turning on and off the computing device 1502, and the like. Other processing devices 1506 may also cooperate with the computing device 1502 to perform computational tasks.
The universal interconnect interface 1504 may be used to transfer data and control instructions between the computing device 1502 and other processing devices 1506. For example, the computing device 1502 may obtain required input data from the other processing devices 1506 via the universal interconnect interface 1504 and write the input data to memory locations on the computing device 1502. Further, the computing device 1502 may obtain control instructions from the other processing devices 1506 via the universal interconnect interface 1504 to write to a control cache on the computing device 1502. Alternatively or in addition, the universal interconnect interface 1504 may also read data from a memory module of the computing device 1502 and transmit the data to the other processing device 1506.
The integrated circuit device 1500 also includes a storage device 1508 that can be coupled to the computing device 1502 and other processing devices 1506, respectively. The storage device 1508 is used to store data of the computing device 1502 and the other processing device 1506, and is particularly suitable for storing all data that cannot be stored in the internal storage of the computing device 1502 or the other processing device 1506.
According to different application scenarios, the integrated circuit device 1500 can be used as a System On Chip (SOC) for devices such as mobile phones, robots, unmanned aerial vehicles, video acquisition, and the like, thereby effectively reducing the core area of a control part, increasing the processing speed, and reducing the overall power consumption. In this case, the universal interconnect interface 1504 of the integrated circuit device 1500 is connected to certain components of the apparatus. Some of the components herein may be, for example, a camera, a display, a mouse, a keyboard, a network card or a wifi interface.
The present disclosure also discloses a chip or an integrated circuit chip, which includes the integrated circuit device 1500. The present disclosure also discloses a chip package structure including the above chip.
Another embodiment of the present disclosure is a board card including the above chip package structure. Referring to fig. 16, the board 1600 may include other kits including, in addition to the plurality of chips 1602 described above, a memory device 1604, an interface device 1606, and a control device 1608.
The memory device 1604 is coupled to the chip 1602 within the chip package structure via a bus 1616 for storing data. Memory device 1604 may include multiple groups of memory cells 1610.
Interface device 1606 is electrically connected to chip 1602 within the chip package. The interface device 1606 is used for data transmission between the chip 1602 and an external device 1612 (e.g., a server or a computer). In this embodiment, the interface device 1606 is a standard PCIe interface, and the data to be processed is transmitted from the server to the chip 1602 through the standard PCIe interface, so as to implement data transfer. The results of the computations performed by chip 1602 are also transferred back to external device 1612 by interface device 1606.
The control device 1608 is electrically connected to the chip 1602 to monitor the state of the chip 1602. Specifically, the chip 1602 and the control device 1608 may be electrically connected through an SPI interface. The control device 1608 may include a single chip microprocessor ("MCU").
Another embodiment of the present disclosure is an electronic device or apparatus, which includes the board card 1600. According to different application scenarios, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
Another embodiment of the present disclosure is an electronic device comprising one or more processors and a memory, the memory having stored therein computer-executable instructions that, when executed by the one or more processors, cause the electronic device to perform the method as described above, in particular to perform the method as described in fig. 7, 9, 13 and 14.
Another embodiment of the disclosure is a computer-readable storage medium having stored thereon computer-executable instructions for computing data within a computing device, which when executed by one or more processors perform the method as described above, in particular the method as described in fig. 7, 9, 13 and 14.
The present disclosure provides a technical scheme of self-adaptive non-uniform low specific point training. By introducing multiple quantization intervals, a non-uniform fixed point data format and a corresponding hardware technical means, the technical problem that the quantization bit width is not low enough so that the energy consumption cannot be effectively reduced is solved, and the following technical effects are achieved:
1. the memory occupation of the network model is reduced, the resource consumption of model training is reduced, and the training speed is improved.
2. When the neural network model disclosed by the invention is used for reasoning, the reasoning speed can be accelerated, and the precision deterioration can be reduced.
3. When training the neural network model of the present disclosure, the problems of training collapse and non-convergence can be avoided.
The foregoing may be better understood in light of the following clauses:
clause a1, a method of computing a neural network based on floating point data, comprising: presetting a plurality of quantization intervals, wherein each quantization interval corresponds to one quantization formula, and each quantization formula shows different quantized gradients; determining that the floating point data falls within a particular interval of the plurality of quantization intervals; quantizing the floating point data according to a quantization formula corresponding to the specific interval to generate fixed point data; and
and calculating the neural network by using the fixed point data.
Clause a2, the method of clause a1, wherein the plurality of quantization intervals comprises a first interval and a second interval, the first interval being:
x|x<2αmax{abs(D)}
wherein x is the floating point data, α is a first segmentation parameter, and D is a data distribution of the floating point data.
Clause A3, the method of clause a2, wherein if the particular interval is the first interval, then the quantization formula is:
Figure BDA0002509683830000291
interval=2shift×2α
Figure BDA0002509683830000292
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
Clause a4, the method of clause a2, wherein the second interval is:
x|x≥2αmax{abs(D)}
wherein x is the floating point data, α is the first segmentation parameter, and D is the data distribution.
Clause a5, the method of clause a4, wherein if the particular interval is the second interval, then the quantization formula is:
Figure BDA0002509683830000301
interval=2shift
Figure BDA0002509683830000302
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
Clause a6, the method of clause a2, wherein the plurality of quantization intervals further comprises a third interval, the third interval being:
x|x>2βmax{abs(D)}
wherein β is a second segmentation parameter and is greater than α.
Clause a7, the method of clause a6, wherein if the particular interval is the second interval then the quantization formula is:
Figure BDA0002509683830000303
interval=2shift×2β
Figure BDA0002509683830000304
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
Clause A8, the method of clause a6, wherein if the particular interval is the third interval, then the quantization formula is:
Figure BDA0002509683830000305
interval=2shift
Figure BDA0002509683830000306
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
Clause a9, the method of clause a2, wherein the numerical distribution is a gaussian distribution, the first interval being a positively and negatively symmetric interval including 0.
Clause a10, the method of clause a1, wherein the precision of the plurality of quantization intervals is non-uniform.
Clause a11, the method of clause a1, wherein the floating point data is one of a weight, an activation value, and a gradient.
Clause a12, a method of computing a neural network based on floating point data, comprising: presetting a plurality of quantization intervals; determining that the floating point data falls within a particular interval of the plurality of quantization intervals; quantizing the floating point data according to the specific interval to generate fixed point data; setting a flag with N bits in a data structure of the fixed point data, wherein the flag bit records the specific interval, and N is a positive integer; and calculating the neural network using the setpoint data.
Clause a13, the method of clause a12, wherein the data structure further comprises: sign bit for recording sign of the fixed point data; and the numerical value bit is used for recording the numerical value of the fixed point data.
Clause a14, the method of clause a13, wherein the most significant bit of the data structure is the sign bit and the least significant bit is the flag bit.
Clause a15, the method of clause a12, wherein the plurality of quantization intervals is 2 intervals, N being equal to 1.
Clause a16, the method of clause a12, wherein the plurality of quantization intervals is 3 or 4 intervals, N equals 2.
Clause a17, an electronic device, comprising: one or more processors; and memory having stored therein computer-executable instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of clauses a 1-16.
Clause a18, a computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, perform the method of any of clauses a 1-16.
The foregoing detailed description of the embodiments of the present disclosure has been presented for purposes of illustration and description and is intended to be exemplary only and is not intended to be exhaustive or to limit the invention to the precise forms disclosed; meanwhile, for the person skilled in the art, based on the idea of the present disclosure, there may be variations in the specific embodiments and the application scope, and in summary, the present disclosure should not be construed as limiting the present disclosure.

Claims (18)

1. A method of computing a neural network based on floating point data, comprising:
presetting a plurality of quantization intervals, wherein each quantization interval corresponds to one quantization formula, and each quantization formula shows different quantized gradients;
determining that the floating point data falls within a particular interval of the plurality of quantization intervals;
quantizing the floating point data according to a quantization formula corresponding to the specific interval to generate fixed point data; and
and calculating the neural network by using the fixed point data.
2. The method of claim 1, wherein the plurality of quantization intervals comprises a first interval and a second interval, the first interval being:
x|x<2αmax{abs(D)}
wherein x is the floating point data, α is a first segmentation parameter, and D is a data distribution of the floating point data.
3. The method of claim 2, wherein if the specific interval is the first interval, the quantization formula is:
Figure FDA0002509683820000011
interval=2shift×2α
Figure FDA0002509683820000012
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
4. The method of claim 2, wherein the second interval is:
x|x≥2αmax{abs(D)}
wherein x is the floating point data, α is the first segmentation parameter, and D is the data distribution.
5. The method of claim 4, wherein if the particular interval is the second interval, then the quantization formula is:
Figure FDA0002509683820000013
interval=2shift
Figure FDA0002509683820000021
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
6. The method of claim 2, wherein the plurality of quantization intervals further comprises a third interval, the third interval being:
x|x>2βmax{abs(D)}
wherein β is a second segmentation parameter and is greater than α.
7. The method of claim 6, wherein if the particular interval is the second interval, then the quantization formula is:
Figure FDA0002509683820000022
interval=2shift×2β
Figure FDA0002509683820000023
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
8. The method of claim 6, wherein if the specific interval is the third interval, the quantization formula is:
Figure FDA0002509683820000024
interval=2shift
Figure FDA0002509683820000025
wherein x isqAnd returning a minimum integer which is larger than or equal to the expression for the fixed point data by the ceil function, wherein b is the quantization bit width.
9. The method of claim 2, wherein the numerical distribution is a gaussian distribution, the first interval being a positive and negative symmetric interval including 0.
10. The method of claim 1, wherein a precision of the plurality of quantization intervals is non-uniform.
11. The method of claim 1, wherein the floating point data is one of a weight, an activation value, and a gradient.
12. A method of computing a neural network based on floating point data, comprising:
presetting a plurality of quantization intervals;
determining that the floating point data falls within a particular interval of the plurality of quantization intervals;
quantizing the floating point data according to the specific interval to generate fixed point data;
setting a flag with N bits in a data structure of the fixed point data, wherein the flag bit records the specific interval, and N is a positive integer; and
and calculating the neural network by using the fixed point data.
13. The method of claim 12, wherein the data structure further comprises:
sign bit for recording sign of the fixed point data; and
and the numerical value bit is used for recording the numerical value of the fixed point data.
14. The method of claim 13, wherein the most significant bit of the data structure is the sign bit and the least significant bit is the flag bit.
15. The method of claim 12, wherein the plurality of quantization intervals are 2 intervals, N being equal to 1.
16. The method of claim 12, wherein the plurality of quantization intervals are 3 or 4 intervals, N being equal to 2.
17. An electronic device, comprising:
one or more processors; and
memory having stored therein computer-executable instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-16.
18. A computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, perform the method of any one of claims 1-16.
CN202010457215.1A 2020-05-26 2020-05-26 Neural network computing method and device, board card and computer readable storage medium Pending CN113723598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010457215.1A CN113723598A (en) 2020-05-26 2020-05-26 Neural network computing method and device, board card and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010457215.1A CN113723598A (en) 2020-05-26 2020-05-26 Neural network computing method and device, board card and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113723598A true CN113723598A (en) 2021-11-30

Family

ID=78672227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010457215.1A Pending CN113723598A (en) 2020-05-26 2020-05-26 Neural network computing method and device, board card and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113723598A (en)

Similar Documents

Publication Publication Date Title
US5506797A (en) Nonlinear function generator having efficient nonlinear conversion table and format converter
CN110852416B (en) CNN hardware acceleration computing method and system based on low-precision floating point data representation form
CN108345939B (en) Neural network based on fixed-point operation
CN110889503B (en) Data processing method, data processing device, computer equipment and storage medium
US20200401873A1 (en) Hardware architecture and processing method for neural network activation function
WO2022111002A1 (en) Method and apparatus for training neural network, and computer readable storage medium
CN110852434A (en) CNN quantization method, forward calculation method and device based on low-precision floating point number
CN110738315A (en) neural network precision adjusting method and device
US11704556B2 (en) Optimization methods for quantization of neural network models
CN112651496A (en) Hardware circuit and chip for processing activation function
CN110110852B (en) Method for transplanting deep learning network to FPAG platform
CN112101541B (en) Device, method, chip and board card for splitting high-bit-width data
WO2022163861A1 (en) Neural network generation device, neural network computing device, edge device, neural network control method, and software generation program
CN113238987B (en) Statistic quantizer, storage device, processing device and board card for quantized data
CN112085176A (en) Data processing method, data processing device, computer equipment and storage medium
CN112085182A (en) Data processing method, data processing device, computer equipment and storage medium
WO2021245370A1 (en) Modulo operation unit
CN113723598A (en) Neural network computing method and device, board card and computer readable storage medium
CN113723597A (en) Neural network computing method and device, board card and computer readable storage medium
CN113723599A (en) Neural network computing method and device, board card and computer readable storage medium
CN113723600A (en) Neural network computing method and device, board card and computer readable storage medium
US20220121908A1 (en) Method and apparatus for processing data, and related product
CN113112009B (en) Method, apparatus and computer-readable storage medium for neural network data quantization
CN113298843B (en) Data quantization processing method, device, electronic equipment and storage medium
CN114692865A (en) Neural network quantitative training method and device and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination