CN108197707A

CN108197707A - Compression method based on the convolutional neural networks that global error is rebuild

Info

Publication number: CN108197707A
Application number: CN201711494011.XA
Authority: CN
Inventors: 纪荣嵘; 林绍辉
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2017-12-31
Filing date: 2017-12-31
Publication date: 2018-06-22

Abstract

Compression method based on the convolutional neural networks that global error is rebuild, is related to the compression of deep neural network.The shortcomings that can not obtaining high-precision classification effect for compress technique in traditional layer based on low-rank decomposition, consider the various non-linear relations of interlayer, the combined optimization of parameter interlayer, instead of the optimization of individual layer, it builds global error and minimizes prioritization scheme, a kind of compression method for the convolutional neural networks rebuild based on global error is provided.Include the following steps：1) nonlinear activation function is not considered, utilizes the low-order decomposition method of linear response in layer, initial compression model size；2) using the low-rank decomposition of matrix in network layer, and consider influence of the nonlinear activation to individual layer, establish compression optimization in non-linear layer, promote the precision of matrix compression in non-linear layer；3) error compressed in layer, increases with successively increasing, and structure global error rebuilds the global judgement index for improving compact model.

Description

Compression method of convolutional neural network based on global error reconstruction

Technical Field

The invention relates to compression of a deep neural network, in particular to a compression method of a convolutional neural network based on global error reconstruction.

Background

In recent years, with the rapid development of hardware GPUs and the advent of a big data era, deep learning has been rapidly developed, and various fields of artificial intelligence, including the fields of graphics, text and video including voice recognition, image recognition, video tracking, natural voice processing and the like, have been pursued. The deep learning technology breaks through the traditional technical method, greatly improves the recognition performance in various fields, and particularly improves the strong self-feature representation capability of Convolutional Neural Networks (CNNs), so that the deep learning technology is widely applied to the fields of image recognition [1-4], target detection [5-7], image retrieval [8] and the like. If the powerful recognition performance of the convolutional neural network can be transplanted into mobile embedded equipment (such as mobile phones, robots, unmanned planes, intelligent recognition glasses and the like), the mobile embedded equipment plays a significant role in rescue and disaster relief in the military aspect, enemy exploration, mobile intelligent recognition in the civil aspect, convenience for people to go out and the like, is also favorable for improving the safety and deterrence of a national defense system, and has important significance in winning at the minimum cost in modern military countermeasures.

Along with the increase of the performance of the convolutional neural network model, the depth of the model is deeper and deeper, and the disadvantage of high storage of the deep network model follows, so that the application environment with limited resources is severely restricted, and particularly the intelligent mobile embedded device is adopted. For example: the 8-layer AlexNet [1] is populated with 600,000 network nodes, 61M network parameters, and takes 240MB of memory storage to sort a color image with a resolution of 224 × 224. The overhead of storage will become larger as the depth of the model deepens. Similarly, to classify a color image with a resolution of 224 × 224, a 16-level VGGNet [2] contains 1,500,000 network nodes and 144M network parameters, which takes 528MB of memory storage. The face recognition by the deep [9] needs 120M network parameters and requires 475MB of memory storage. For the weak storage capability of the mobile terminal device, the huge deep network cannot be directly stored and operated. On one hand, over millions of convolutional neural network models are provided to store a large amount of redundant information, so that not all parameters and structures play a role in generating high discriminability of CNNs. On the other hand, with shallow or simple CNNs, millions of CNNs cannot be approached in performance. Therefore, the original convolution network model is compressed and directly applied to the intelligent mobile embedded equipment, and an effective solution is provided.

At present, a large number of convolutional neural network compression relies on three ways, namely, parameter sharing, parameter pruning, and matrix decomposition. For parameter sharing, document [10,11] reduces redundancy of the parameter space by vector-quantizing or product-quantizing the parameters in the network. Document [12] proposes a new type of Hash nets with a network structure, where the network uses a low-consumption Hash (Hash) function to map different parameters of a full connection layer in CNNs into the same Hash bucket (Hash bucket), and reduces the memory overhead of the model by sharing the parameters after Hash mapping. Document [13] replaces the conventional linear mapping with cyclic mapping, thereby reducing memory overhead and speeding up matrix computations using fast fourier transforms. For parameter pruning, by exploring redundant information inside neurons, document [14] finds and merges neurons with similar significance, and proposes a parameter pruning method completely independent of data. In recent years, a method directly relying on parameter measurement is used as a basis for determining whether or not a parameter needs to be deleted, and a network parameter with a small L1 norm value is deleted in documents [15 and 16], thereby reducing the number of parameters of a model. For matrix factorization, document [17] compresses the weights of each layer of the network using a low rank factorization technique. Document [18] converts the original weight matrix into a transport train format representation, thereby reducing the number of parameters and simultaneously ensuring the expressive power of the model. However, these methods only consider the compression in the layer, and do not provide the overall classification accuracy of the explicit model alignment network, and because a large number of nonlinear activation transformation functions exist in the convolutional network, the compression in the local layer leads to a large final classification error after the nonlinear transformation. Considering a global, explicit compression technique, restoring the original network classification accuracy will be the focus of research.

Reference documents:

[1].A.Krizhevsky,I.Sutskever,G E.Hinton.Imagenet classification withdeep convolutional neural networks.Advances in neural information processingsystems.2012:1097-1105.

[2].K.Simonyan,A.Zisserman.Very deep convolutional networks forlarge-scale image recognition.arXiv preprint arXiv:1409.1556,2014.

[3].C.Szegedy,W.Liu,Y.Jia,et al.Going deeper withconvolutions.Proceedings of the IEEE Conference on Computer Vision andPattern Recognition.2015:1-9.

[4].K.He,X.Zhang,S.Ren,et al.Deep residual learning for imagerecognition.Proceedings of the IEEE Conference on Computer Vision and PatternRecognition.2016:770-778.

[5].R.Girshick,J.Donahue,T.Darrell,et al.Rich feature hierarchies foraccurate object detection and semantic segmentation.Proceedings of the IEEEconference on computer vision and pattern recognition.2014:580-587.

[6].R.Girshick.Fast r-cnn.Proceedings of the IEEE InternationalConference on Computer Vision.2015:1440-1448.

[7].R.Ren,K.He,R.Faster.Towards real-time object detection withregion proposal networks.Advances in neural information processingsystems.2015:91-99.

[8].Y.Gong,L.Wang,R.Guo,et al.Multi-scale orderless pooling of deepconvolutional activation features.European conference on computervision.2014:392-407.

[9].Y.Taigman,M.Yang,M.Ranzato,et al.Deepface:Closing the gap tohuman-level performance in face verification.Proceedings of the IEEEConference on Computer Vision and Pattern Recognition.2014:1701-1708.

[10].Y.Gong,L.Liu,M.Yang,et al.Compressing deep convolutionalnetworks using vector quantization.arXiv preprint arXiv:1412.6115,2014.

[11].J.Wu,C.Leng,Y.Wang,et al.Quantized convolutional neural networksfor mobile devices.Proceedings of the IEEE Conference on Computer Vision andPattern Recognition.2016:4820-4828.

[12].W.Chen,J.Wilson,S.Tyree,et al.Compressing neural networks withthe hashing trick.International Conference on Machine Learning.2015:2285-2294.

[13].Y.Cheng,F.Yu,R.Feris,et al.An exploration of parameterredundancy in deep networks with circulant projections.Proceedings of theIEEE International Conference on Computer Vision.2015:2857-2865.

[14].S.Srinivas,R.Babu.Data-free parameter pruning for deep neuralnetworks.arXiv preprint arXiv:1507.06149,2015.

[15].S.Han,J.Pool,J.Tran,et al.Learning both weights and connectionsfor efficient neural network.Advances in Neural Information ProcessingSystems.2015:1135-1143.

[16].S.Han,H.Mao,W.J.Dally.Deep compression:Compressing deep neuralnetworks with pruning,trained quantizationand huffman coding.arXiv preprintarXiv:1510.00149,2015.

[17].M.Denil,B.Shakibi,L.Dinh,et al.Predicting parameters in deeplearning.Advances in Neural Information Processing Systems.2013:2148-2156.

[18].A.Novikov,D.Podoprikhin,A.Osokin,et al.Tensorizing neuralnetworks.Advances in Neural Information Processing Systems.2015:442-450.

disclosure of Invention

The invention aims to provide a compression method of a convolutional neural network based on global error reconstruction, which aims at the defect that the traditional intra-layer compression technology based on low-rank decomposition cannot obtain a high-precision classification effect, considers various non-linear relations among layers and the joint optimization among parameter layers, replaces single-layer optimization, and constructs a global error minimization optimization scheme.

The invention comprises the following steps:

1) the nonlinear activation function is not considered, a low-rank decomposition method of linear response in layers is utilized, and the size of the model is initially compressed;

2) by utilizing low-rank decomposition of the matrix in the network layer and considering the influence of nonlinear activation on a single layer, establishing nonlinear in-layer compression optimization and improving the compression precision of the matrix in the nonlinear layer;

3) the error of in-layer compression increases along with the increase of layer by layer, and the global discriminant force of the compression model is improved by constructing global error reconstruction.

The invention has the following outstanding advantages:

1) the invention overcomes the defect that the discrimination precision of a compressed model is greatly reduced by low-rank decomposition in an original layer, not only considers the in-layer redundant information, but also considers the nonlinear activation function and the dependency relationship between layers of the convolutional neural network, and constructs a convolutional neural network compression method based on global error reconstruction. In the present invention, the required equipment requirement is low, an NVIDIA TITAN X GPU graphics card is required for accelerating the training of the network, and a normal PC is used for testing and storing the compressed convolutional neural network.

2) According to the method, the original intra-layer-based low-rank decomposition technology is modified, the inter-layer correlation is considered, the reconstruction error of the model is increased, and the identification precision of the compressed model is improved. The method can compress a classical AlexNet model and a VGGNet model to obtain compression ratios 15 times and 16 times respectively, and the classification accuracy of the compressed model is only reduced by less than 1 percent compared with the accuracy of an original model.

3) The compressed model can be directly embedded into a mobile phone or a mobile equipment end, so that the calculated amount and the memory space of the original model of the load are greatly compressed, the high-performance deep learning model is implanted into the mobile embedded equipment end, and a great deal of application prospects are realized in the fields of rescue and relief work in the military aspect, enemy exploration, mobile intelligent identification in the civil aspect, convenience for people to go out and the like.

Drawings

FIG. 1 is a flow chart of a convolutional neural network compression method based on global error reconstruction in particular;

FIG. 2 is a graph showing the relationship between the compression ratio of AlexNet and Top-1 classification errors in different compression methods;

FIG. 3 is a relationship between compression ratio of AlexNet and Top-5 classification error for different compression methods;

FIG. 4 is a graph of the compression ratio of VGGNet-19 versus Top-1 classification error for different compression methods;

FIG. 5 is a graph of the compression ratio of VGGNet-19 versus Top-5 classification error for different compression methods;

fig. 6 shows the classification display effect of different algorithms on ImageNet partial pictures.

Detailed Description

The following examples will further illustrate the present invention with reference to the accompanying drawings.

The invention aims to design a global-based convolutional neural network compression method with explicit function based on the fact that various nonlinear relations among layers and combined optimization among parameter layers are considered aiming at the defect that the traditional intra-layer compression technology based on low-rank decomposition cannot obtain high-precision classification effect, a single-layer optimization is replaced, and a global error minimization optimization scheme is constructed. The specific algorithm flow is shown in fig. 1.

Each module is specifically as follows:

1. non-linear companding

A large number of nonlinear activation functions exist in the convolutional neural network, the influence of nonlinear transformation on linear low-rank approximation is considered, and a reconstruction error optimization function similar to nonlinear ReLU transformation is established:

whereinFor the input of the original network of the l-th layer,is an approximate input to the l-th layer. Obtaining a low-rank approximate matrix of each layer by optimizing the formula (1) layer by layer from a low layer to a high layerIf the matrix of l-1 layer has been optimized, thenIs a fixed constant vector, notedAnd useInstead of W^l Equation (1) can be rewritten as:

by introducing intermediate variables, the equation (2) is equivalently transformed

s.t.

Wherein lambda is a penalty factor which is a function of,is andintermediate variables of the same size. The solution of equation (3) that can be solved by the alternate optimization is equivalent to solving the original equation (1).

1) FixingUpdating

s.t.

Writing in matrix form:

s.t.

order toTo pairCarrying out GSVD decomposition to obtainThus, an optimum value is obtainedWhereinAre respectively U^l,S^l,V^lThe first k column vectors of (a).

2) FixingUpdating

Using optimization of eachElement of (1) instead of one-time optimizationWriting equation (3) as a one-dimensional optimization equation:

wherein,is thatThe jth element of (1), considering the constraint of the ReLU nonlinearity,the optimal solution is obtained by considering positive and negative valuesNamely:

if it isIn equation (6), the value of the optimization function is less thanOtherwiseSpecific examples ofThe linear companding algorithm is shown in table 1, and the penalty factor is 1 in the present invention. By carrying out nonlinear expansion optimization on the parameters of each full connection layer, the initial compression parameters can be obtained

TABLE 1 non-linear companding Algorithm

2. Global reconstruction error

The invention utilizes a global error reconstruction method to reduce the influence of initial compression on the final classification error of the network. The specific formula is as follows:

wherein,is the (non-approximate) output of the original network,contains m-1 matrix parameters of the hidden layer, namely:whereinIs the input vector. Writing formula (7) in a matrix form, and approximating the decomposition result by using a non-linearly expanded parameter matrix to obtain:

wherein l is 0,1, …, m-1.And (3) obtaining gradient information of each layer by using a BP algorithm, and then updating parameters by using a random gradient descent method. The specific gradient per layer is as follows:

wherein error information of the loss functionThe following can be obtained:

wherein,

the specific experimental results are as follows:

in recent years, with the rapid development of hardware GPUs and the advent of a big data era, deep learning has been rapidly developed, and various fields of artificial intelligence, including the fields of graphics, text and video including voice recognition, image recognition, video tracking, natural voice processing and the like, have been pursued. The deep learning technology breaks through the traditional technical method, greatly improves the recognition performance of various fields, particularly the strong self-feature representation capability of a convolutional neural network, and is widely applied to the fields of image recognition, target detection, image retrieval and the like. However, due to the disadvantage of high storage of the model of the convolutional neural network, the model cannot be directly embedded into a mobile device end with a limited storage space. Therefore, the powerful identification performance of the convolutional neural network is transplanted to mobile embedded equipment (such as mobile phones, robots, unmanned planes, intelligent identification glasses and the like), and the powerful identification performance plays a significant role in rescue and disaster relief in the military aspect, enemy exploration, mobile intelligent identification in the civil aspect, convenience for people to go out and the like.

The compression of the convolutional neural network is an important direction in the field of computer vision and artificial intelligence in recent years, and the classification accuracy of a compressed model is not high due to the fact that the traditional compression method only aims at approximate compression of redundant information in the model and does not consider the relationship between network layers and the transmission accumulation of errors. The invention provides a global error accumulation-based convolutional neural network compression method, which can effectively recover the precision of a compressed model, and the compressed model can be directly embedded into a mobile equipment end and used for tasks such as target identification and detection.

The experiments of the invention are verified on AlexNet and VGGNet-19, and ImageNet data sets are adopted. FIGS. 2 to 5 show that compared with the State-of-the-art method LRD algorithm, BIN algorithm, PQ algorithm, AS algorithm, and the GER-IC algorithm which does not introduce global error reconstruction and adopts nonlinear expansion, the compression method (i.e., GER) adopted by the invention can compress the convolutional neural network better, and simultaneously keep smaller error improvement. Fig. 6 shows the effect of different algorithms on the classification of ImageNet partial pictures.

The effect between compression ratio and classification error with fixed compression factor is shown in table 2. As can be seen from table 2, by introducing global error reconstruction, the algorithm of the present invention achieves the best overall performance compared to other compression methods.

TABLE 2 comparison between Classification error rates for each algorithm at compression ratios of 16 and 32

The formula is illustrated below: (the formula variables and symbols defined can be described with reference to specific formula expressions)

Equation (1) defines a reconstruction error optimization function that is subjected to a nonlinear ReLU transform in order to reduce the results of the original output of each layer and the output of the compressed model.

Equation (2) is an equivalent alternative to equation (1).

Equation (3) introduces intermediate variables in order to better optimize the objective function.

Equation (4) and equation (5) are fixed variablesThe latter transformation, the purpose of which is to solve the approximate optimization parameters

Equation (6) is an alternative representation to equation (3) for the purpose of optimizingEach element of (1).

Equation (7) is the objective function of global error reconstruction, and aims to obtain a globally optimal parameter approximation to the original model.

The formula (8) is an approximate expression of the formula (7), and the purpose is expressed in a matrix form, so that the calculation and optimization of the model can be facilitated.

Equation (9) is the gradient value of the global error reconstruction function for each model parameter, and is used for updating the parameters by the stochastic gradient descent method to obtain the optimal parameter value.

Equation (10) is the error signal of the global reconstruction error function for the hidden layer activation value of each layer, and is used to obtain the gradient information of each parameter.

The English proper noun is defined as follows:

top-1 is defined as the model classification is correct when the class with the maximum probability of model output is the correct label.

Top-5 is defined as that when the category corresponding to the Top five ranked categories in the model output probability has a correct label, the model is classified correctly.

Claims

1. The method for compressing the convolutional neural network based on global error reconstruction is characterized by comprising the following steps of: