CN116757255A

CN116757255A - Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model

Info

Publication number: CN116757255A
Application number: CN202310653803.6A
Authority: CN
Inventors: 白雪梅; 李佳璐; 张晨洁; 胡汉平; 史新瑞; 侯聪聪
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2023-09-15

Abstract

In recent years, research on recognition of distracted driving behaviors has been greatly advanced, and a deep learning-based method is widely focused by more and more researchers, but most models have the problem of large weight files, and further have difficulty in practical application and deployment, so that light weight improvement of the models is necessary. Aiming at the problems that the existing distraction driving recognition algorithm model is too large and is difficult to adapt to low computing environment and the like, a lightweight network MobileNet V2 is selected as a main network and is improved, the calculation amount is reduced by replacing point-by-point convolution through a Ghost module, the problem of neuronal death is avoided by adding a LeakyReLU function, on the basis, model parameters are further reduced through a channel pruning algorithm, an improved MoblieNet V2 network model is trained, and finally an image to be detected is input into a detection model obtained through training, and the driving behavior type is output.

Description

Method for improving weight reduction of mobile NetV2 distracted driving behavior detection model

Technical Field

The invention relates to the field of image classification of deep learning, in particular to model weight reduction realized by improving a MobileNet V2 bottleneck structure and compressing a model.

Background

The traditional method for detecting the distraction driving behavior mainly comprises driving behavior monitoring based on facial features of a driver and distraction driving behavior monitoring based on body posture features of the driver. However, most of these methods rely on manually extracted texture or shape features, which cannot simultaneously satisfy the accuracy and real-time requirements of driving behavior monitoring, and have certain limitations in practical application.

With the development of deep learning, some researchers propose a deep learning-based distracted driving behavior recognition method, i.e. a deep learning method is adopted to recognize the face of a driver or locate the facial key points of the driver to judge that the driver has distracted driving behaviors. Although the accuracy is higher, the model structure is larger and redundant, the consumed computing resources are always increased, the training difficulty is high, the real-time performance is poor, and the requirement of equipment deployment in practical application is difficult to meet. In recent years, a lightweight network model which becomes a research hot spot can be used for carrying out lightweight reconstruction on a network in terms of volume and speed on the premise of maintaining accuracy as much as possible, so as to achieve the purposes of deployment and application.

The invention fully utilizes the depth separable rolling and residual error pouring structure by means of the MobileNet and the MobileNet V2 variant thereof proposed by Howard et al, thereby achieving the effects of real-time, high efficiency and easy deployment. However, due to the large amount of computation caused by the 1*1 convolution in mobilenv 2, the performance of such models in terms of weight reduction is still insufficient. Therefore, the Ghost module is used for replacing point-by-point convolution in the MobileNet V2 network, so that the operation amount is reduced, the Leaky ReLU function is used for replacing the original activation function, the problem of neuronal death is avoided, model parameters are further reduced through a channel pruning algorithm, the accuracy is improved well, and the purpose of network weight reduction is achieved.

Disclosure of Invention

The invention judges 10 driving behaviors of normal driving, left-hand short message sending, left-hand call making, right-hand short message sending, right-hand call making, radio operation, water drinking, body back turning, face arrangement and passenger speaking, takes a lightweight network model MobileNet V2 as a basis of a driving behavior recognition model, replaces point-by-point convolution with a Ghost module, reduces a large number of floating point operations, replaces an original activation function with a leak ReLU function, avoids the problem of neuronal death, further reduces model parameters through a channel pruning algorithm, improves the calculation efficiency of a neural network, and realizes end-to-end driver distraction driving behavior detection. The high-precision, high-efficiency and low-memory consumption distraction driving detection algorithm solves the problem that most of the current driving behavior recognition algorithm researches consider improving accuracy and neglect light weight.

The method can be realized by the following steps:

step one, downloading an open source data set State Farm data set, dividing a training set and a testing set according to 8:2, and preprocessing all the data sets.

And step two, replacing point-by-point convolution in the MobileNet V2 network by Ghostmodule, and replacing the ReLU6 function by using the leak ReLU function.

And thirdly, carrying out model compression on the structure by adopting a channel pruning algorithm.

And step four, setting training super parameters, and inputting the data set image into the improved MoblieNetV2 network model to obtain a complete training distraction driving behavior detection model.

And fifthly, inputting the image to be detected into a detection model obtained through training, and outputting a driving behavior detection result.

The invention has the advantages and beneficial effects as follows:

1. the novel light-weight distraction driving detection model can obtain higher test precision;

2. the method solves the problem of large calculation amount caused by point-to-point convolution in the model, greatly reduces the parameter number of the model on the premise of small fluctuation of accuracy, and is beneficial to practical application and deployment.

Drawings

FIG. 1 is a flow chart of the algorithm in the present invention.

Fig. 2 is a schematic representation of different distraction behavior patterns.

Fig. 3 is a schematic diagram of a Ghostmodule.

Fig. 4 shows a modified MobileNetV2 bottleneck structure.

Fig. 5 is a flow chart of a model pruning algorithm.

Fig. 6 is a schematic diagram of the channel pruning principle.

FIG. 7 shows the accuracy and loss curves during the training process of the proposed algorithm.

Detailed Description

The specific use process of the invention is realized by the following steps:

step one, downloading a State Farm data set. The State Farm dataset contains 10 types of actions: normal driving, sending a short message by the left hand, making a call by the left hand, sending a short message by the right hand, making a call by the right hand, operating a radio, drinking water, turning back the body, arranging the face and speaking with the passengers. The data set is a competition data set on a competition platform kagle, is a first distraction driving behavior identification data set which can be downloaded in a public way, and has 22424 marked pictures in the data set, and the size of the images is 480 multiplied by 460. Training set and test set this experiment was divided in 8:2.

After the data set is acquired, data preprocessing is carried out, and the actual size of the image in the State Farm data set is inconsistent with the size of the input image in the model, so that the size in the data set is uniformly adjusted to 224 multiplied by 224 of the input size of the model and randomly cut, and then the normalization function processing is carried out, so that the convergence of the model can be quickened by the image data after normalization processing.

And step two, introducing a Ghost module and a leak ReLU function to improve the MobileNet V2 network. Some convolution feature diagrams in convolution calculation are very similar, and another part of feature diagram can be obtained by linear transformation of part of feature diagrams. And generating another part of the characteristic diagram by using a part of the intrinsic characteristic diagram and using an operation with smaller calculation cost, wherein the intrinsic characteristic diagram and the generated characteristic diagram are cascaded according to channels to serve as an output characteristic diagram of the module. Aiming at the problem of large calculation amount of point-by-point convolution, ghostmodule is used for replacing point-by-point convolution in the MobileNetV2 inverse residual error module.

An activation function is typically used in neural networks to add non-linear factors to improve the expressive power of the model, and the equation for the ReLU function is shown in (1). The method is a common convolutional neural network activation function, gradient saturation and gradient disappearance can not occur when x is more than 0, the calculation complexity is low, exponential operation is not needed, and an activation value can be obtained only by a threshold value. However, when x is less than or equal to 0, the gradient is 0, the gradient of the neuron and the following neurons is always 0, no response to any data is generated, and the corresponding parameters are never updated, namely, the neurons are necrotic. The Leaky ReLU function introduces a very small alpha value as a gradient when x is less than or equal to 0 on the basis of the ReLU function, and the formula is shown as (2), so that neuronal necrosis can be avoided, and the gradient is supplemented.

ReLU(x)＝max(0,x) (1)

And thirdly, further reducing model parameters by adopting a pruning algorithm. The pruning algorithm comprises the following three steps:

(1) Performing scaling factor sparsification training on the improved MobileNet V2 to obtain a model with sparse scaling factors so as to find unimportant channels in a network;

(2) In the pruning stage, pruning a channel corresponding to a scaling factor lower than a threshold value to obtain a pruned network;

(3) Retraining the pruning network to compensate for the precision loss caused by pruning.

And in the sparse training process, each channel of each convolution layer is allocated with a scaling factor gamma as an important basis for pruning, and under the regularized addition, all parameters finish tasks on the basis of gamma deviation of 0 or 1, and the parameters are multiplied with the input of the channel, so that the extracted characteristics of each channel of each layer produce different action effects, wherein the absolute value of the scaling factor represents the importance of the channel. Let Z be _in and Z_out Representing the input and output of BN layers; b represents the current mini-batch, and the conversion of BN layer is shown as (3) (4):

wherein ,represents the estimated value, mu _B and σ_B The mean and standard deviation of B, gamma, epsilon and beta are trainable superparameters of BN layer, respectively.

Meanwhile, according to the change of parameters in a network, introducing a penalty term related to gamma, wherein the sparse training loss function is shown in a formula (5):

in the formula (5), (x, y) represents the input and output of the network, W represents a trainable parameter, f (x, W) represents the parameter input, the first term represents the loss during original network training, the second term represents the L1 regularization about γ, g (γ) represents the sparse induction penalty of the scaling factor, λ represents the hyper-parameter, and the effect is to balance the normal training loss and the proportion of the penalty of the channel scaling factor, Γ represents the value set of the scaling factor γ.

After the sparsification training is finished, a large number of sparse scaling coefficients exist in the pre-trained network, the scaling coefficients of the BN layer are ordered, and then pruning is carried out on the channels corresponding to the scaling coefficients lower than the threshold value.

And step four, setting training super parameters, inputting training images into the improved feature extraction network, training the model until convergence, and obtaining a completely trained distraction driving behavior detection model.

Firstly, inputting a training image into a model, then, calculating a cross entropy loss function with a known real label of the image to obtain a loss value, then, carrying out back propagation, updating model weights by using an Adam optimizer, repeating the operation of inputting the model again to complete a second iteration after the training of the data set is completed, and repeating the iteration operation until the loss value or the accuracy rate does not have large fluctuation within a section range, thus finishing the training.

In the invention, the initial learning rate is set to 0.0001, the batch_size is set to 16, the network model training process calculates the loss by using a cross entropy function as a distraction driving loss function, and the cross entropy function is a function for calculating cross entropy in Pytorch, and the formula is shown in (6):

where n is the input number, y _i And training 100 epochs for real one_hot codes to obtain final distracted driving behavior network model parameters. The running result of the State Farm data set shows that the test accuracy of the model can reach 94.66%, and the model parameters only occupy 0.23M.

And fifthly, inputting the acquired image to be detected into a training-obtained distraction driving behavior detection model, and outputting a behavior type, namely a prediction result.

Claims

1. The method for improving the weight reduction of the detection model of the distracted driving behavior of the MobileNet V2 is characterized by comprising the following steps of: the method is realized by the following steps:

And step two, selecting the MobileNet V2 as a model, replacing point-by-point convolution by Ghostmodule, and replacing the original function by using a leakage ReLU function.

And step four, setting training super parameters, and inputting training set images into the improved MoblieNetV2 network model to obtain a complete training distraction driving behavior detection model.

And fifthly, inputting the image to be tested into a detection model obtained through training, and outputting the driving behavior type.

2. The method for improving the weight reduction of the mobile netv2 model for detecting the behavior of the distracted driving of the vehicle according to claim 1, wherein the method comprises the following steps: before training, the images are preprocessed, the sizes in the data set are uniformly adjusted to 224×224 of the model input sizes, and the preprocessed image data set is obtained through normalization function processing.

3. The method for improving the weight reduction of the mobile netv2 model for detecting the behavior of the distracted driving of the vehicle according to claim 1, wherein the method comprises the following steps: aiming at the problem of large calculation amount of point-to-point convolution, replacing point-to-point convolution in the MobileNet V2 by using a Ghost module; the Leaky ReLU function introduces a very small alpha value as a gradient when x is less than or equal to 0 on the basis of the original function, can avoid neuronal necrosis, and supplements the gradient.

4. The method for improving the weight reduction of the mobile netv2 model for detecting the behavior of the distracted driving of the vehicle according to claim 1, wherein the method comprises the following steps: the pruning algorithm in the third step comprises the following steps: firstly, in order to find unimportant channels in a network, performing scaling factor sparsification training on the added improved MobileNet V2 to obtain a model with sparse scaling factors; and then in the pruning stage, pruning the channel corresponding to the scaling coefficient lower than the threshold value to obtain a pruned network, and retraining the pruned network to compensate the precision loss caused by pruning.

5. The method for improving the weight reduction of the mobile netv2 model for detecting the behavior of the distracted driving of the vehicle according to claim 1, wherein the method comprises the following steps: since each convolution layer in the detection network is added with a Batch Normalization (BN) layer, Z is set _in and Z_out Representing the input and output of BN layers; b represents the current mini-batch, BN layer transformation is as shown in (1) (2):

wherein ,represents the estimated value, mu _B and σ_B The mean and standard deviation of B, gamma, epsilon and beta are trainable superparameters of BN layer, respectively. It can be seen that the BN layer has the possibility to convert standard linear activation into various scales.

Meanwhile, according to the change of parameters in a network, introducing a penalty term related to gamma, wherein the sparse training loss function is shown in a formula (3):

wherein (x, y) represents the input and output of the network, W represents a trainable parameter, f (x, W) represents a parameter input, the first term represents the loss during original network training, the second term represents the L1 regularization about γ, g (γ) represents the sparse induction penalty of the scaling factor, λ represents the hyper-parameter, and the effect is to balance the normal training loss and the proportion of the channel scaling factor penalty term loss, Γ represents the valued set of the scaling factor γ.

6. The method for improving the weight reduction of the mobile netv2 model for detecting the behavior of the distracted driving of the vehicle according to claim 1, wherein the method comprises the following steps: setting super parameters, setting an initial learning rate to be 0.0001, setting a batch_size to be 16, and training 100 epochs to obtain final distracted driving behavior network model parameters.