CN109902745A

CN109902745A - A CNN-based low-precision training and 8-bit integer quantization inference method

Info

Publication number: CN109902745A
Application number: CN201910154088.5A
Authority: CN
Inventors: 严敏佳; 王永松; 刘丹
Original assignee: Chengdu Kang Qiao Electronic LLC; University of Electronic Science and Technology of China
Current assignee: Chengdu Kang Qiao Electronic LLC; University of Electronic Science and Technology of China
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2019-06-18

Abstract

The invention provides a kind of low precision training based on CNN and 8 integers quantify inference methods, key step are as follows: low accuracy model training；Model training is carried out with the low precision fixed point algorithm of 16 floating types, obtains the model for target detection；Quantization weight；It proposes 8 integer quantization schemes, the weight parameter of convolutional neural networks is quantified as 8 integers from 16 floating types by layer；8 integers quantify reasoning；Activation value is quantified as 8 integer datas, i.e. each layer of CNN all receives int8 type quantification and inputs and generate int8 quantization output.The present invention obtains weight with the low precision fixed point algorithm training pattern of 16 floating types, the integer data that re-quantization is 8 carries out forward inference, the weight that compared to 32 floating type algorithm training patterns obtain directly carries out 8 integer quantization reasonings, the reasoning process for optimizing convolutional layer effectively reduces low level fixed point quantization reasoning bring loss of significance.

Description

A kind of low precision training based on CNN and 8 integers quantization inference methods

Technical field

The invention belongs to convolutional neural networks technical field, more particularly, to a kind of low precision training based on CNN Quantify inference method with 8 integers.

Background technique

Convolutional neural networks (Convolutional Neural Networks, CNNs) image classification, target detection, The fields such as recognition of face achieve superior achievement, but due to the complexity and computation delay of network structure, in storage resource and The embedded platform of computing resource relative deficiency realizes the real-time forward inference of CNNs, needs the condition in control loss of significance Under, compress the model size and lift scheme computational efficiency of neural network.

Currently used method be quantify CNN weight and (or) activation value, by data from 32 floating types be converted to compared with The integer of low level.But current quantization method still remains deficiency, many quantization methods in the tradeoff of precision and computational efficiency Web compression has been carried out to varying degrees, saves storage resource, but effectively cannot improve computational efficiency in hardware platform. Lot of documents quantization weight at present efficiently solves the problems, such as that storage resource is insufficient on hardware platform, but shorter mention calculates Efficiency.And binary neural network (BNN), ternary right value network (TWN), XNOR-net realize that multiplication is operated by being displaced, Computational efficiency is improved in hardware platform, but weight and activation value, which are all quantified as 1, to be indicated or 2 expressions normally result in The sharp fall of precision, this is very strict to performance force request of the quantization scheme to model, be not suitable for network structure it is simple, It is suitable for the light weight model for being applied to embedded platform.

The data representation of low level saves hardware resource, greatly optimizes the design of hardware accelerator.But it largely grinds Study carefully all is the high-precision model training of 32 floating types to be carried out using GPU acceleration, and low precision is only carried out in forward inference Quantization, accelerate the forward inference speed of CNNs.When being indicated using the data of extremely low position, the heavy losses of parameters precision cause Simulated target detection accuracy more significantly declines, therefore the model of the low precision of training is particularly important.

Summary of the invention

The invention will solve the problems, such as to be intended to overcome above-mentioned defect existing in the prior art, propose that one kind is based on The training of low precision and 8 integers of CNN quantify inference methods, solve that loss of significance existing for existing quantization method is larger, meter Calculate the not high enough problem of efficiency.

In order to solve the above technical problems, the technical solution of the invention is achieved in that

A kind of low precision training based on CNN and 8 integers quantization inference methods, include the following steps:

Low accuracy model training；Model training is carried out with the low precision fixed point algorithm of 16 floating types, is obtained for target detection Model, i.e. weight；

Quantization weight；It proposes 8 integer quantization schemes, quantifies the weight parameter of convolutional neural networks from 16 floating types by layer For 8 integers；

8 integers quantify reasoning；Activation value is quantified as 8 integer datas, i.e. each layer of CNN all receives int8 type amount Change and inputs and generate int8 quantization output.

Further, the training of low accuracy model is included in large server end with GPU acceleration come training pattern, in calculating process Data are saved with 16 floating types.

Further, in 16 real-coded GAs in calculating process, with 2 preservation integer parts, with 14 preservation decimals Part；With 14 precision for retaining real-coded GA that round up.

Further, quantization weight includes proposing quantization scheme:Wherein x indicates floating type Data, a, b respectively indicate the minimum value of data, maximum value in array to be quantified, i.e. a:=min (x_i), b:=max (x_i).Again By rounding up, function Round () rounding obtains quantized value q.

Further, weight is divided into a series of arrays by layer, seeks the most value of each weight array, by the same array Interior data equal proportion scaling is at 8 integers.

Further, 8 integer quantization reasoning processes include the following steps:

(a) BN algorithm pre-processes；The mean value and variance that input sample is calculated before convolution algorithm, are normalized pretreatment, The calculating for saving BN algorithm is time-consuming；

(b) integer convolution algorithm；Integer multiplying is converted by floating type multiplying with 8 integer quantization schemes, as far as possible It is fitted floating type multiplication and calculates effect, while integer multiplying significantly improves computational efficiency.

(c) optimize activation primitive；The active region [a, b] of each convolutional layer is successively chosen, activation primitive handle is optimized Convolution algorithm result is mapped to known region [a, b], and the calculating for saving quantization activation value is time-consuming.

Further, it is calculated in above-mentioned reasoning process using full integer data.

The invention has the advantages and positive effects of:

The present invention obtains weight, the integer that re-quantization is 8 with the low precision fixed point algorithm training pattern of 16 floating types first According to forward inference is carried out, the weight that compared to 32 floating type algorithm training patterns obtain directly carries out 8 integer quantizations and pushes away Reason effectively reduces low level fixed point quantization reasoning bring loss of significance.In addition, the Calculation bottleneck due to convolutional neural networks is Convolutional layer proposes 8 integer quantization schemes, and quantify pushing away for scheme optimization convolutional layer using 8 integers to improve computational efficiency Reason process first carries out the pretreatment of BN algorithm, and the calculating for reducing BN algorithm is time-consuming, then carries out convolution algorithm, then collects in verifying The active region for determining each convolutional layer is tested in enterprising line number thousand times quantization reasonings, is saved and is activated letter in reasoning process every time Quantization activation value requires first to seek the time overhead that activation value is most worth in real time after number operation.This method be fitted floating-point arithmetic before to Reasoning process improves convolutional layer computational efficiency under the premise of controlling loss of significance.

Detailed description of the invention

Fig. 1 is tiny-YOLO network structure；

Fig. 2 is the flow chart that the present invention is applied to that tiny-YOLO network carries out model training, forward inference optimization；

Fig. 3 a is not carry out pretreated 8 integers of BN to quantify reasoning flow chart；

Fig. 3 b is that the present invention carries out the pretreated 8 integers quantization reasoning flow chart of BN；

Fig. 4 is overall procedure of the present invention.

Specific embodiment

It should be noted that the feature in embodiment and embodiment in the case where not colliding, in the invention It can be combined with each other.

Detailed description of specific embodiments of the invention is provided below.

Technical solution of the present invention is divided into two stages: the first stage be with the low precision fixed point algorithms of 16 floating types into Row model training obtains the model for target detection, i.e. weight.Second stage is using 8 integer quantization scheme quantization power Activation value, is quantified as 8 integer datas, realizes the quantization reasoning of 8 integers by weight.

Specific step is as follows:

A. the low precision fixed point algorithm training pattern of 16 floating types for using<2.14>, i.e., indicate integer part with 2, with 14 It indicates fractional part, converts 16 real-coded GAs for 32 real-coded GAs with rounding up.In model training parameter In the presence of the data for being largely similar to 0, retaining 14 precision will not make multi-parameter be converted to 0, cause the loss of gradient information.This The training error of method can be fitted 32 floating type algorithms substantially, will not significantly reduce convergence rate.

B. quantization weight, the weight that model training is obtained are converted into 8 integer datas.8 are described with following formula The mathematical definition of integer quantization scheme, the i.e. int8 of numerical value indicate that (value after indicating quantization with q) (uses x with former floating type expression Indicate 32 original floating type numerical value) between corresponding relationship:

First by 16 real-coded GA scalings to the floating type of [- 127,128], wherein a, b respectively indicate number in array to be quantified According to minimum value, maximum value, i.e. a:=min (x_i), b:=max (x_i).Quantized value q is obtained by round again The integer 16 floating number equal proportions being scaled in [- 127,128] range.

C. the pre- of Batch normalization (BN) algorithm is carried out with the input data of weight and convolutional layer after quantization Processing.BN algorithm is usually to be handled after the convolution algorithm of each convolutional layer with individual operating block, the mathematics of BN algorithm It indicates as shown by the following formula:

Wherein W is the weight matrix of convolutional layer, and X is the input of convolutional layer, i.e. characteristic pattern matrix, zoom factor γ and be biasing ginseng Number β are BN layers and learn reconstruction parameter, can allow the network recovery to go out the feature distribution learnt in training process.ε be one very Small constant, μ are characterized the mean value of figure, δ²It is characterized the variance of figure.

Following formula can be released:

It simplifies are as follows:

DefinitionW is first calculated before convolution algorithm each time_BN、β_BN, And it is quantified as 8 integer datas.

D. with the W being calculated in step C_BNInput (characteristic pattern) with convolutional layer carries out convolution algorithm, i.e., According to matrix operation property, it converts convolution algorithm to the inner product operation of multiple two vectors, facilitates the DSP calculating using FPGA The hardware realization of unit progress inner product of vectors.Convolution operation result is stored with int32 type in coding, then operation result amount Int8 type is turned to, then plus the β being calculated in step C_BN。

For any output valve w of convolution algorithm_i, it is by two vector υ_i=[x_i1,x_i2,……x_in],μ_i=[y_i1, y_i2,……y_in] inner product operation obtain:Wherein c indicates the convolutional layer Channel number.

The mathematical definition of quantization scheme in step B can convert are as follows:Substituting into inner product operation formula can obtain It arrives:It can be reduced to Its InMatrix multiplication is sought with simplified formula, wherein only B is floating type, the B extreme values with data Correlation can precalculate, and convert integer convolution algorithm purpose for floating type convolution algorithm to reach.

E. optimize activation primitive, the operation result of step D is activated, obtain the output result of the convolutional layer.It is testing Card collects enterprising thousand quantization reasonings of line number experiment and collects numerical value before the activation in reasoning process since first convolutional layer, intends Data distribution curve is closed, the appropriate activation range [a, b] of each convolutional layer is successively chosen, guarantees to reduce precision damage as far as possible It loses.By taking Leaky activation primitive as an example:

Design Leaky_n activation primitive:

Mainly illustrate the present invention in model so that tiny-YOLO network is in the real-time vehicle detection of object detection field as an example below Improvement in terms of training, integer reasoning.

The weight size and input feature vector figure size such as Fig. 1 institute of tiny-YOLO network structure and each convolutional layer Show.Tiny-YOLO shares 15 layers, is made of 9 convolutional layers and 6 pond layers.Network structure is simple, and parameter amount is relatively fewer, It is easier to be deployed to embedded platform realization real-time target detection.

The dimension of picture of specific embodiment of the invention is unified for 224*224 pixel, the input of first convolutional layer be to The pixel matrix for detecting the rgb format of picture, it is a series of by being exported after the BN processing of convolutional layer, convolution algorithm, activation Characteristic pattern, export new characteristic pattern using first pond layer, next layer is read spy of upper one layer of the output as this layer Sign figure carries out operation.The last layer obtains object detection results, and weight size is related to target detection classification, this example is only examined Survey " vehicle " this classification.

Before carrying out target detection using CNN, need at large server end with GPU acceleration come training pattern,

Object detection task is completed in application platform with the weight that training obtains again.The present invention is absorbed in the mould of the low precision of training Type, and accelerate the forward inference process of CNN.

The present invention carries out model training, the flow chart that forward inference optimizes as shown in Fig. 2, specific with tiny-YOLO network Implementation steps:

1,16 low accuracy model training

Model training uses darknet deep learning frame, and input dimension of picture unified standard is 224*224.Using 16 The low accuracy model training program in position, the integer part of floating number is indicated with 2, the fractional part of floating number is indicated with 14, is used It rounds up and retains 14 precision of real-coded GA.

Model training step:

(1) first with ImageNet data set training sorter network, the number of iterations is 1,600,000 times；

(2) driving data collection training detection network is disclosed with BDD100K, the number of iterations is 400,000 times.Due to tiny-YOLO mould Type structure is simple, and generalization ability is poor, and when training set and test set come from different distributions, object detection results are not ideal enough, Therefore the COCO data set for being commonly used to training detection network is replaced using BDD100K data set.

(3) it is finely adjusted with the DDS data the set pair analysis model customized, the number of iterations is 400,000 times.DDS data set is root According to practical application scene, the road conditions video in front of city acquisition driving vehicle, samples the key frame of video at home, then Mark classification obtains.

2, quantization weight, with 8 integer quantization schemes will the obtained weight W of training from 16 floating types be quantified as 8 it is whole Type data.

Step 3-10 introduction quantifies the process that inference schemes complete an object detection task with integer.

3, input verifying collection picture, obtains the pixel matrix of image, numerical value is the integer in [0,255] section, directly The integer data for saving into 8, the input sample X as first convolutional layer.

Step 4-8 introduces the integer reasoning process of convolutional layer, and 8 integer quantization schemes are applied directly to reasoning process Process is as shown in Figure 3a, by the pretreated 8 integers quantization reasoning process of BN as shown in Fig. 3 b.

4, the mean value and variance of convolutional layer input sample X are calculated: Its Middle m is the number of minimum batching data.

5, W is calculated_BN、β_BNAnd 8 integer datas are saved as, wherein It completes BN pretreatment.

6, it is calculated with integer convolution algorithm methodConvolution algorithm is saved with 32 integers as a result, adding The calculated offset parameter β of step 5_BN。

7, it is activated using Leaky activation primitive, and collects the activation value of convolutional layer when detecting every picture, fitting activation The data distribution function of value.

8, activation value is quantified as 8 integers, obtains the volume by maximum value, the minimum value for seeking the activation value that step 7 obtains The output characteristic pattern of lamination.

9, the input feature vector figure for using the output characteristic pattern of step 8 as pond layer carries out maximum pondization operation, generates new Characteristic pattern.

10, step 4-9 is repeated according to the network structure of tiny-YOLO, is verified the testing result of collection picture, and calculated Mean accuracy mean value (mean Average Precision, mAP), the reference frame as following optimization activation primitive.

11, first suitable active region of convolutional layer [a, b] is chosen by the data distribution function that step 7 obtains, used Convolution algorithm result is mapped to known region [a, b] by activation primitive Leaky_n.It repeats step 3-10 and obtains modification first Target detection index mAP after the activation primitive of convolutional layer, control mAP loss can omit step 8 at this time, that is, ask less than 0.1% Take the process that activation value is most worth.

The active region that step 11 successively chooses the 2-9 convolutional layer is repeated, successively modifies activation primitive Leaky_n, most Control mAP loss is less than 1% eventually.

12, input test collection picture quantifies inference schemes using 8 integers for optimizing activation primitive and completes target detection Task obtains testing result.

In conjunction with described above, it is as shown in Figure 4 to obtain overall procedure of the invention.

In conclusion the present invention is to obtain weight with the low precision fixed point algorithm training pattern of 16 floating types, then measure first It turns to 8 integer datas and carries out forward inference, the weight that compared to 32 floating type algorithm training patterns obtain directly carries out 8 Position integer quantifies reasoning, effectively reduces low level fixed point quantization reasoning bring loss of significance.

In addition, to improve computational efficiency, proposing 8 integer quantities since the Calculation bottleneck of convolutional neural networks is convolutional layer Change scheme, and using the reasoning process of 8 integer quantization scheme optimization convolutional layers, the pretreatment of BN algorithm is first carried out, BN is reduced The calculating of algorithm is time-consuming, then carries out convolution algorithm.

The active region for testing each determining convolutional layer in the enterprising line number thousand times quantizations reasoning of verifying collection later, saves Quantify activation value in reasoning process after each activation primitive operation to require first to seek the time overhead that activation value is most worth in real time.It should Method is fitted floating-point arithmetic forward inference process, under the premise of controlling loss of significance, improves convolutional layer computational efficiency.

It is obvious to a person skilled in the art that the invention is not limited to the details of above-mentioned exemplary embodiment, and And without departing substantially from the spirit or essential attributes of the invention, wound that the present invention can be realized in other specific forms It makes.

Therefore, in all respects, the present embodiments are to be considered as illustrative and not restrictive, this The range of innovation and creation is indicated by the appended claims rather than the foregoing description, it is intended that equally wanting for claim will be fallen in All changes in the meaning and scope of part are included in the invention.Any appended drawing reference in claim should not be regarded To limit the claims involved.

In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiments being understood that.

Claims

1. a low-precision training and 8-bit integer quantization reasoning method based on CNN, is characterized in that, comprises the steps:

Low-precision model training; use 16-bit floating-point low-precision fixed-point algorithm for model training to obtain a model for target detection;

Quantize weights; propose an 8-bit integer quantization scheme to quantize the weight parameters of the convolutional neural network from 16-bit floating point to 8-bit integer by layer;

8-bit integer quantization inference; quantize the activation value into 8-bit integer data, that is, each layer of the CNN accepts int8 type quantization input and generates int8 quantization output.

2. a kind of low-precision training and 8-bit integer quantization reasoning method based on CNN according to claim 1, it is characterized in that: low-precision model training comprises using GPU acceleration to train the model on the large-scale server side, and in the calculation process, use 16-bit floating point type holds data.

3. a kind of CNN-based low-precision training and 8-bit integer quantization inference method according to claim 2, is characterized in that: in the 16-bit floating-point data in the calculation process, use 2 bits to save the integer part, use 14 bits hold the fractional part, and rounding is used to preserve the 14-bit precision of floating-point data.

4. a kind of CNN-based low-precision training and 8-bit integer quantization inference method according to claim 1, is characterized in that, the quantization weight comprises the proposed quantization scheme: , where x represents floating point data, Represent the minimum and maximum values of the data in the array to be quantized, namely ,

Then, the quantized value q is obtained by rounding the rounding function Round().

5. A CNN-based low-precision training and 8-bit integer quantization inference method according to claim 4, characterized in that: the weights are divided into a series of arrays by layer, the maximum value of each weight array is obtained, and the The data in the same array is proportionally scaled to 8-bit integers.

6. a kind of low-precision training and 8-bit integer quantization reasoning method based on CNN according to claim 1, is characterized in that, 8-bit integer quantization reasoning process comprises the steps:

(a) BN algorithm preprocessing; calculate the mean and variance of the input samples before the convolution operation, and perform normalization preprocessing;

(b) Integer convolution operation; use an 8-bit integer quantization scheme to convert floating-point multiplication operations into integer multiplication operations;

(c) Optimize the activation function; select the activation region [a, b] of each convolution layer layer by layer, and optimize the activation function to map the result of the convolution operation to the known region [a, b].

7. The CNN-based low-precision training and 8-bit integer quantization inference method according to any one of claims 1 to 6, characterized in that: in the inference process, full integer data is used for calculation.