CN107506722A

CN107506722A - One kind is based on depth sparse convolution neutral net face emotion identification method

Info

Publication number: CN107506722A
Application number: CN201710714001.6A
Authority: CN
Inventors: 吴敏; 苏婉娟; 陈略峰; 周梦甜; 刘振焘; 曹卫华
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2017-08-18
Filing date: 2017-08-18
Publication date: 2017-12-22

Abstract

The invention provides one kind to be based on depth sparse convolution neutral net face emotion identification method, first emotion image preprocessing, afterwards affective feature extraction, last affective characteristics identification classification.The present invention is a kind of to be optimized based on depth sparse convolution neutral net face emotion identification method from Nesterov accelerating gradients descent algorithm to the weights of depth sparse convolution neutral net, it is optimal network structure, to improve the generalization of face emotion recognition algorithm, because NAGD has anticipation, prevent to the property of can appreciate that algorithm is progressively too fast or too slow, the responding ability of algorithm can be strengthened simultaneously, and more preferable local optimum can be obtained.

Description

Deep sparse convolution neural network-based face emotion recognition method

Technical Field

The invention relates to a face emotion recognition method based on a deep sparse convolution neural network, and belongs to the field of pattern recognition.

Background

In recent years, along with the development of various technologies, the degree of intelligence of the society is also continuously improved, and people are more and more eager to experience natural and harmonious human-computer interaction. However, emotion is always a gap that cannot be healed across human and machine. Therefore, breaking through the bottleneck of current emotion calculation is the key to the development of the artificial emotion field. The expression is one of important channels for human emotion expression, and the human face emotion recognition has certain application value in the fields of human-computer interaction, fatigue driving detection, remote nursing, pain assessment and the like, and has very wide application prospect. Therefore, more accurate expression recognition can be realized to promote the intelligent development of the society.

The face emotion recognition can be mainly divided into emotion feature extraction and emotion feature recognition classification. The face emotion recognition is still in the laboratory stage, and the expression of the opposite side cannot be naturally and smoothly recognized like a human in the human-computer interaction process. The existing face emotion recognition algorithms are difficult to accurately extract emotion characteristics, the complexity of the algorithms is high, the recognition time is long, and the real-time requirement in the human-computer interaction process cannot be met. Therefore, the method extracts the characteristics with obvious difference among the expressions, more accurately classifies the expressions with different expression forms, improves the algorithm efficiency, and is the key for realizing the human face emotion recognition.

Deep learning is a new field in machine learning research, and the motivation is to establish and simulate a neural network for human brain to analyze and learn, which simulates the mechanism of human brain to interpret data. The deep sparse convolutional neural network is a neural network composed of a convolutional neural network, Dropout and Softmax regression, and is one of deep learning models. The method introduces the randomized sparsity in the deep convolutional network through the Dropout layer, thereby improving the training efficiency of the network; and the network structures optimized by training each time are different, so that the optimization of the weight value does not depend on the combined action of neurons with fixed relations, the joint adaptability among the neurons is weakened, the method is similar to sexual propagation in natural selection, and meanwhile, the generalization capability of the network is improved. In deep learning, selection of an optimization algorithm is very important, in some previous researches, only setting of a network structure is usually considered, and a traditional gradient descent algorithm is easy to fall into a poor local optimal value, so that generalization performance of a neural network is poor.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a face emotion recognition method based on a deep sparse convolutional neural network, wherein a Nesterov Accelerated Gradient Descent (NAGD) algorithm is selected to optimize the weight of the deep sparse convolutional neural network, so that the network structure is optimal, and the generalization of the face emotion recognition algorithm is improved. The NAGD has a predictive capability to predictably prevent the algorithm from advancing too fast or too slow, while at the same time enhancing the response capability of the algorithm and obtaining a better local optimum.

The technical scheme adopted by the invention for solving the technical problem is as follows: the method for recognizing the human face emotion based on the deep sparse convolution neural network comprises the following steps of:

(1) preprocessing the emotion image: firstly, carrying out rotation correction and face cutting processing on an emotion image sample to be recognized, extracting an emotion characteristic key area, normalizing the image to be of a uniform size, and then carrying out histogram equalization on the emotion image to obtain a preprocessed emotion image;

(2) extracting emotional characteristics: firstly, extracting main component emotional characteristics of a preprocessed emotional image based on a PCA method to obtain characteristic data of different emotions; then, whitening the extracted feature data to obtain a PCA feature map of the emotion image to be recognized;

(3) and (3) emotion feature identification and classification: the method comprises the steps of constructing a deep sparse convolution neural network consisting of a convolution layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, firstly inputting a PCA characteristic diagram of a training set and a label value corresponding to emotion into the deep sparse convolution neural network, optimizing the deep sparse convolution neural network by adopting a Nesterov accelerated gradient descent algorithm, then inputting the PCA characteristic diagram of an emotion image to be recognized into the deep sparse convolution neural network, and outputting a recognition result, namely the label value corresponding to the emotion type.

The emotion image preprocessing in the step (1) specifically comprises the following processes:

(1-1) calibrating three feature points of two eyes and a nose tip in the emotion image to obtain coordinate values of the three feature points;

(1-2) rotating the emotion image according to the coordinate values of the left eye and the right eye, enabling the two eyes to be on the same horizontal line, setting the interval between the two eyes to be d, and setting the middle point of the interval to be O;

(1-3) cutting the face according to the facial features, taking d in the left and right direction of the horizontal direction and cutting 0.5d and 1.5d rectangular regions in the up and down direction of the vertical direction respectively by taking O as a reference, wherein the rectangular regions are face emotion sub-regions;

(1-4) transforming the scale of the face emotion subarea into a uniform size of 128 x 128 pixels;

and (1-5) carrying out histogram equalization on the emotion subarea of the human face to obtain a preprocessed emotion image.

The extraction of the emotional features in the step (2) specifically comprises the following steps:

(2-1) performing mean normalization by subtracting the average brightness value mu of the emotion image to make the characteristic mean values of data in the emotion image be around 0, specifically comprising the following steps:

the preprocessed emotion image data of size 128 × 128 is stored in a matrix 128 × 128, i.e. { x'⁽¹⁾,x′⁽²⁾,…,x′⁽ⁿ⁾}，x′⁽ⁱ⁾∈RⁿAnd n is 128, and each preprocessed emotion image is subjected to zero averaging by using an equation (1) and an equation (2):

x′⁽ⁱ⁾＝x′⁽ⁱ⁾-μ (2)

(2-2) calculating a feature vector U of a covariance matrix sigma of the emotion image after zero averaging, wherein a calculation formula of sigma is as follows:

the emotion image pixel point value x' is calculated using the basis { U } of the feature vector U₁,u₂,…,u_nExpression:

(2-3) selection of x'_rotThe first k principal components to retain 99% variance, i.e., the minimum value of k when equation (5) is chosen:

λ_jrepresenting the corresponding jth eigenvalue in the eigenvector U;

(2-4) mixing x'_rotExcept k main components to be preserved, the rest are set to zeroIs x'_rotIs approximately expressed, thenExpressed as:

(2-5) pairsScaling was performed to remove the correlation between the individual features, all with unit variance:

(2-6) transforming the feature vector U by adopting ZCA whitening to change the covariance matrix of the emotion image into a unit matrix I:

x′_ZCAwhite＝Ux′_PCAwhite(8)

x'_ZCAwhiteNamely the PCA characteristic diagram of the emotion image to be identified.

And (4) the label values 1-7 in the step (3) are respectively in one-to-one correspondence with 7 types of emotions, such as anger, aversion, fear, joy, neutrality, difficulty and surprise.

The emotion feature identification and classification in the step (3) specifically comprises the following processes:

(3-1) creating a deep sparse convolutional neural network sequentially consisting of a convolutional layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, and inputting training set data into the deep sparse convolutional neural network, wherein the training set data comprises a PCA characteristic diagram of a training set and label values corresponding to emotions, namely { (x)₁,y₁),...,(x_m,y_m) And y is_m∈ {1, 2.., k }, where x is_iFor the PCA profile of the training set, y_iIs x_iIteratively training a deep sparse convolution neural network by adopting a NAGD algorithm according to a corresponding emotion label value i ∈ {1, 2.., m }, wherein the iterative training comprises the following processes:

(3-1-1) randomly shuffling the training set data, grouping the data in the training set, wherein the number of the data in each group is consistent, and sequentially inputting each group into a deep sparse convolution neural network;

(3-1-2) each group of training set data firstly passes through a convolutional layer respectively, wherein the convolutional layer is provided with 100 convolutional kernels with the dimensionality of 29 multiplied by 29, and the moving step length of the convolutional kernels is 1; the deep sparse convolution neural network excavates local correlation information in a PCA characteristic diagram of a training set through convolution kernels, and the implementation process of the convolution layers is as follows:

a_i,k＝f(x_i*rot90(W_k,2)+b_k) (9)

wherein, a_i,kThe ith PCA profile data x of the training set input via the kth convolution kernel in the convolutional layer_iConvolution characteristic graph obtained by convolution processing, namely valid convolution operation, W_kWeight representing the kth convolution, b_kFor the deviation corresponding to the kth convolution kernel, f (-) is a Sigmoid-type activation function:

(3-1-3) inputting the convolution characteristic graph generated by the convolution layer into a sub-sampling layer, wherein the sub-sampling layer adopts average pooling, the average pooling dimension is set to be 4, the moving step length is 4, the size of the pooled characteristic graph obtained after the convolution characteristic graph passes through the sub-sampling layer is changed into one fourth of the original size, the number of the characteristic graphs is unchanged, and the average pooling adopts the following formula:

wherein, c_jGenerating a jth pooling feature map for the sub-sampling layer, wherein p is an average pooling dimension;

(3-1-4) utilizing the Dropout layer to reduce the phenomenon of network overfitting, and randomly making all data passing through the Dropout layer, namely the pooling characteristic map generated in the step (3-1-3), not work, and keeping the data which does not work, wherein the calculation process is as follows:

DropoutTrain(x)＝RandomZero(p)×x (12)

wherein DropotTracin (x) represents a data matrix obtained after a Dropout layer passes through a training stage, and RandomZero (p) represents that a value input into the data matrix x of the layer is set to be 0 with a set probability p;

(3-1-5) classifying and identifying rows of the data matrix obtained after passing through a Dropout layer by utilizing a Softmax regression layer:

(3-1-5-1) using the hypothesis function h_θ(x) Calculating the probability value p (y is j | x), h of the data matrix obtained after passing through the Dropout layer and appearing in each expression category j_θ(x) Is a k-dimensional vector, each vector element value corresponds to the probability value of the k categories respectively, and the vector elementsThe sum of the elements is 1, h_θ(x) In the form of:

wherein theta is₁,θ₂,...,θ_k∈Rⁿ⁺¹The parameters of the model are obtained by random assignment in the initial training stage; x is the number of⁽ⁱ⁾Representing the ith pooling characteristic diagram data in the data matrix obtained after passing through the Dropout layer;

(3-1-5-2) the Softmax regression layer evaluates the classification effect using a cost function J (θ):

wherein 1{ y⁽ⁱ⁾J is an illustrative function whose value rule is 1{ expression whose value is true } ═ 1, such as 1{1+1 ═ 3} ═ 0, 1{1+1 ═ 2} ═ 1, y⁽ⁱ⁾Representing an emotion tag value;

the above formula is derived to obtain a gradient formula:

wherein λ represents a weight attenuation term in the formula (15)The factor (c) is a preset value;

(3-1-6) calculating residual errors of all layers and gradients of network parameters theta in a cost function J (W, b; x, y) in Softmax regression by using a back propagation algorithm, and specifically comprising the following steps of:

(3-1-6-1) if the l-th layer is fully connected to the l + 1-th layer, the residual calculation of the l-layer uses the following formula:

^(l)＝((W^(l))^T ^(l+1))·f‘(z^(l)) (16)

the gradient calculation formula of the parameter W is:

the gradient calculation formula of the parameter b is as follows:

wherein,^(l+1)is the residual error of the l +1 th layer in the network, J (W, b; x, y) is the cost function, (W, b) is the weight and the threshold parameter, and (x, y) is the training data and the label respectively;

(3-1-6-2) if the l-th layer is a convolutional layer and the l + 1-th layer is a sub-sampling layer, the residual is propagated by the following equation:

where k is the number of the convolution kernel,denotes x_i*rot90(W_k,2)+b_k，Is a Sigmoid-type partial derivative of the activation function, of the form:

(3-1-7) according to the calculated gradient of theta, NAGD uses a momentum term gamma v_t-1To update the parameter theta by calculating theta-gamma v_t-1To obtainTo an approximation of the future position of the parameter θ, the update formula for NAGD is:

θ＝θ-v_t(22)

wherein,is formed by training set (x)⁽ⁱ⁾,y⁽ⁱ⁾) The gradient of the parameter theta is obtained by calculation, α is the learning rate, v_tIs the current velocity vector, v_t-1Is the velocity vector in the previous iteration, α is initially set to 0.1, v_tInitially set to 0, and the same dimension as the parameter vector θ, γ ∈ (0, 1)]Setting gamma to 0.5 in the initial training stage, and increasing gamma to 0.95 after the first training iteration is finished;

(3-1-8) returning to the step (3-1-1) until the set iteration times are reached, and finishing the training optimization of the deep sparse convolution neural network;

(3-2) inputting the PCA feature map of the emotion image to be recognized into a deep sparse convolution neural network, and recognizing and classifying the feature map:

(3-2-1) PCA feature map of emotion image to be identified firstly passes through convolution layer and sub-sampling layer, and x'_ZCAwhiteSubstituting input x in equation (9)_iObtaining a convolution feature map a 'obtained by performing convolution processing on a PCA feature map of the emotion image to be identified input through the kth convolution kernel of the convolution layer'_i,k；

Then a 'is prepared'_i,kSubstituting into formula (11) for a therein_i,kAcquiring a pooling characteristic map c' of the emotional image to be identified, namely high-level emotional characteristics;

(3-2-2) when the pooled feature map c 'of the emotion image to be recognized continues to pass through a Dropout layer, carrying out average processing on c':

DropoutTest(c′)＝(1-p)×c′ (23)

dropottest (c ') represents a data matrix obtained after the pooling characteristic diagram c' of the emotion image to be identified continuously passes through a Dropout layer;

(3-2-3) hypothetical function h Using Softmax regression layer_θ(x) And calculating the probability value of c' appearing in each expression category j, and outputting the category j corresponding to the obtained maximum probability value, namely outputting a classification result.

The invention has the beneficial effects based on the technical scheme that:

the invention introduces the randomized sparsity in the deep sparse convolution neural network through the Dropout layer, and the network structures optimized by each training are different, so that the optimization of the weight value does not depend on the combined action of neurons with fixed relation, the joint adaptability among the neurons is weakened, and the generalization capability and the training efficiency of the network are improved. Optimizing the weight of the deep convolutional neural network by adopting NAGD to optimize the network structure; compared with the traditional gradient descent algorithm, the NAGD has the capability of predicting, predictably preventing the algorithm from advancing too fast or too slow, simultaneously enhancing the response capability of the algorithm and obtaining a better local optimal value.

Drawings

FIG. 1 is a general flow diagram of the present invention.

FIG. 2 is a schematic diagram of emotion image preprocessing.

FIG. 3 is a face emotion feature image after feature extraction based on PCA.

FIG. 4 is a schematic diagram of a deep sparse convolutional neural network.

FIG. 5 JAFFE and CK + database partial image samples.

Fig. 6 is a line graph showing the influence of p on the recognition effect and training time in the Dropout layer.

Fig. 7 symmetrically transforms an image contrast map.

Fig. 8 experimental confusion matrix.

FIG. 9 is a topological structure diagram of a human-computer interaction system based on human face emotion recognition.

FIG. 10 GUI System debug interface.

Detailed Description

The invention is further illustrated by the following figures and examples.

The invention provides a human face emotion recognition method based on a deep sparse convolution neural network, and a general flow diagram of the method is shown in figure 1. Firstly, carrying out image preprocessing on an emotion image sample, namely correcting and cutting the direction of a human face, and carrying out histogram equalization on the emotion image sample; then extracting bottom layer emotional characteristics based on PCA; and finally, mining and learning high-level emotional features by using the constructed deep sparse convolutional neural network, identifying and classifying the high-level emotional features, and training and optimizing the network weight by using the NAGD so as to optimize the whole network structure and improve the human face emotion identification performance.

The method for recognizing the human face emotion based on the deep sparse convolution neural network can be mainly divided into three parts, namely emotion image preprocessing, emotion feature extraction and emotion feature recognition classification, and the realization process comprises the following steps:

(1) preprocessing the emotion image: as shown in fig. 2, firstly, performing rotation correction and face clipping on an emotion image sample to be recognized, extracting an emotion feature key region, normalizing the image to a uniform size, and then performing histogram equalization on the emotion image to obtain a preprocessed emotion image; the method specifically comprises the following steps:

(1-1) manually calibrating three feature points of two eyes and a nose tip in the emotion image by using a function [ x, y ] ═ ginput (3), and obtaining coordinate values of the three feature points;

(2) Extracting emotional characteristics: firstly, extracting main component emotional characteristics of a preprocessed emotional image based on a PCA method to obtain characteristic data which are different among different emotions and easy to process; and whitening the extracted feature data to obtain a PCA feature map of the emotion image to be recognized. The obtained face emotion image after extracting emotion features based on PCA is shown in FIG. 3, and specifically comprises the following steps:

x′⁽ⁱ⁾＝x′⁽ⁱ⁾-μ (2)

λ_jrepresenting the corresponding jth eigenvalue in the eigenvector U;

x′_ZCAwhite＝Ux′_PCAwhite(8)

(3) And (3) emotion feature identification and classification: and constructing a deep sparse convolutional neural network which consists of a convolutional layer, a sub-sampling layer (pooling layer), a Dropout layer and a Softmax regression layer and is shown in figure 4, wherein the convolutional layer, the pooling layer and the Dropout layer are used for mining and learning high-level emotional features, and the Softmax regression layer is used for identifying and classifying the learned emotional features and outputting classification results, namely label values corresponding to the emotional categories. The label values of 1-7 correspond to 7 types of emotions, namely anger, aversion, fear, joy, neutrality, difficulty and surprise one to one respectively.

Firstly, inputting PCA characteristic graphs of a training set and label values corresponding to emotions into a deep sparse convolutional neural network, optimizing the deep sparse convolutional neural network by adopting a Nesterov accelerated gradient descent algorithm to optimize a network structure so as to improve the generalization of a face emotion recognition algorithm, and after network training is finished, storing an optimal weight of the network to obtain the optimized deep sparse convolutional neural network. And then, in a testing stage, inputting a testing set, namely a PCA characteristic diagram of the emotion image to be recognized, into the deep sparse convolution neural network, and outputting a recognition result, namely a label value corresponding to the emotion category. The method specifically comprises the following steps:

(3-1) creating a deep sparse convolution neural network sequentially consisting of a convolution layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, and inputting training set data into the deep sparse convolution neural network, wherein the training set data comprisesPCA feature maps of the training set and tag values for corresponding emotions, i.e., { (x)₁,y₁),...,(x_m,y_m) And y is_m∈ {1, 2.., k }, where x is_iFor the PCA profile of the training set, y_iIs x_iIteratively training a deep sparse convolution neural network by adopting a NAGD algorithm according to a corresponding emotion label value i ∈ {1, 2.., m }, wherein the iterative training comprises the following processes:

a_i,k＝f(x_i*rot90(W_k,2)+b_k) (9)

DropoutTrain(x)＝RandomZero(p)×x (12)

(3-1-5) classifying and identifying the input data by utilizing a Softmax regression layer:

(3-1-5-1) using the hypothesis function h_θ(x) Calculating the probability value p (y is j | x), h of the data matrix obtained after passing through the Dropout layer and appearing in each expression category j_θ(x) Is a k-dimensional vector, each vector element value corresponds to the probability value of the k classes, respectively, and the sum of the vector elements is 1, h_θ(x) In the form of:

the above formula is derived to obtain a gradient formula:

(3-1-6) calculating the residual error of each layer and the gradient of each network parameter theta in a cost function J (W, b; x, y) in Softmax regression by using a reverse conduction algorithm, and specifically comprising the following steps of:

^(l)＝((W^(l))^T ^(l+1))·f‘(z^(l)) (16)

the gradient calculation formula of the parameter W is:

the gradient calculation formula of the parameter b is as follows:

(3-1-7) according to the calculated gradient of theta, NAGD uses a momentum term gamma v_t-1To update the parameter theta by calculating theta-gamma v_t-1To obtain an approximation of the future position of the parameter θ, the update formula for NAGD is:

θ＝θ-v_t(22)

wherein,is formed by training set (x)⁽ⁱ⁾,y⁽ⁱ⁾) The gradient of the parameter theta is obtained by calculation, α is the learning rate, v_tIs the current velocity vector, v_t-1Is the velocity vector in the previous iteration, αInitial setting to 0.1, v_tInitially set to 0, and the same dimension as the parameter vector θ, γ ∈ (0, 1)]Setting gamma to 0.5 in the initial training stage, and increasing gamma to 0.95 after the first training iteration is finished;

(3-1-8) returning to the step (3-1-1), knowing the set iteration times, and completing the training optimization of the deep sparse convolution neural network;

DropoutTest(c′)＝(1-p)×c′ (23)

The face emotion database used in the experiment by using the method is JAFFE and CK + databases, and partial sample images of the databases are shown in FIG. 5, wherein the first row is JAFFE database samples, and the second row is CK + database samples. The JAFFE database is 213 gray images composed of 7 basic expressions of 10 women, the size of the image is 256 × 256, and each expression image of each person has 2 to 4 images. The CK + database consists of 210 adults of different ethnic and sexual classes, 18 to 50 years old, and comprises 326 labeled expression image sequences, wherein each image has a size of 640 x 490 and comprises 7 expressions, namely anger, disgust, fear, joy, difficulty, surprise and slight. Taking the expression in a calm state as a neutral expression, and combining seven basic expressions with six expression peak value image frames except for the slight to form 399 expressions.

80% of the JAFFE facial expression database was used as a training sample and 20% as a test sample. Fig. 6 shows a graph obtained by changing the size of the p value in the Dropout layer, and it can be seen that the training time gradually shortens and the recognition rate tends to increase as the p value increases. This indicates that when training a deep sparse convolutional neural network, selecting an appropriate p-value in the Dropout layer is beneficial to improve the generalization performance of the network and shorten the required training time. The influence of p on the training time and the recognition rate is comprehensively considered, and p is 0.5 as an optimal value, so that the time required by network training can be effectively reduced, the training efficiency is greatly improved, the network performance is also improved, and a good recognition effect can be obtained.

One of the problems common to deep learning algorithms is that they require a large amount of data to learn during the training phase. However, the amount of data available in some existing public databases is not sufficient to satisfy the data required by the deep learning algorithm. Thus, to increase the number of training samples without having some duplicate samples, all the original samples are symmetrically transformed to double the number of database samples, and the symmetrically transformed image contrast map is shown in fig. 7. To verify the effectiveness of the added samples, the control variable experiments were set as follows: 80% of the JAFFE facial expression database was used as a training sample and 20% as a test sample. Keeping the parameters of the algorithm unchanged, and training the proposed deep sparse convolution neural network by using two training sets, wherein the two training sets respectively consist of an original image and an image added with symmetric transformation, and the two experimental use test sets are the same. Since the Dropout layer has a significant effect on the recognition effect, in order to highlight the effect of increasing the sample on the recognition effect, in the experiment, p is set to be 0, the Dropout layer is shielded, and the NAGD is adopted to optimize the network, and the experiment result is shown in table 1:

TABLE 1 comparison of the results

Table 2 shows the emotion recognition effect obtained by training the deep sparse convolution neural network with the conventional Momentum-based random gradient Descent (MSGD) and NAGD algorithms. The experimental database adopts a JAFFE database, symmetric transformation images are added to training samples, 1 is taken, and 0.5 is taken as p. The experimental result shows that the experimental result obtained by training the network by using the NAGD is more stable and the recognition effect is better than the experimental result obtained by training the network by using the MSGD.

TABLE 2 NAGD and MSGD test results

In order to verify the effectiveness of the algorithm provided by the invention, experiments are respectively carried out in JAFFE and CK + databases. 80% of the JAFFE facial expression database was used as a training sample and 20% as a test sample. The CK + database has wider ranges of ages, sexes and races than the JAFFE database, and in order to better learn various emotional characteristics of various people, the CK + database uses a training set with a larger proportion than JAFFE, namely 90% of images in the database are selected as the training set, and the rest 10% of images are selected as the test set. The experimental results obtained by adding a symmetrically transformed image to each picture in the training set, taking 1 and p as 0.5 are shown in table 3.

TABLE 3 identification results obtained on JAFFE and CK + databases

As can be seen from table 3, the proposed algorithm achieved good recognition in both the JAFFE and CK + databases, with a recognition rate of 97.62% for JAFFE and 95.12% for CK +. The training time per image was 0.6757 seconds on average and the recognition time per image was 0.1258 seconds on average, based on the training time and recognition time found in the table, divided by the number of images in the training set/test set. The recognition rate and misclassification of each type of expression for the two experiments are shown in the confusion matrix in fig. 8, in which AN., DI., FE., NE., sa, and SU. correspond to seven basic expressions, angry, aversive, fear, happy, neutral, difficult, and surprised, respectively.

The invention builds a set of human-computer interaction system based on a human face emotion recognition algorithm, the human-computer interaction system mainly comprises a wheeled robot, an emotion calculation workstation, a router, data transmission equipment and the like, and a topological structure diagram is shown in figure 9. The system firstly acquires human face emotion image frame data through a Kinect configured on the wheeled robot, then transmits the data to an emotion calculation workstation, the workstation inputs the data to a trained human face emotion recognition system for recognition, and finally the workstation feeds back a recognition result to the wheeled robot, so that the wheeled robot can realize natural and harmonious interaction with a human.

A GUI interface is built on a system debugging interface of the human-computer interaction system for debugging through MATLAB 2016a, and a schematic diagram of the GUI interface is shown in FIG. 10. In a GUI system debugging interface, clicking an image preview button, calling a Kinect color camera by the system, and displaying a captured image on an image window on the left side of the GUI interface in real time; clicking an emotion recognition button, acquiring a currently captured image and displaying the currently captured image on an image window on the right side of a GUI interface, then manually acquiring coordinates of two eyes and a nose tip so as to correct and cut a face, and inputting the cut face image into a trained deep convolutional neural network for face emotion recognition; and feeding back the final recognition result to the GUI interface and displaying the final recognition result.

And 2 groups of 7 basic expression image frames of 3 individuals are collected as a training set and input into a deep convolutional neural network for training, and then the image frames captured by the Kinect are input into the trained network for recognition. Table 4 shows the online recognition results of the 7 basic expressions for the 3 persons,

table 4 application test results

As can be seen from the table, the average recognition rate of the three groups of experiments is 76.190%, which shows the prospect of the invention in practical application.

Claims

1. A face emotion recognition method based on a deep sparse convolution neural network is characterized by comprising the following steps:

2. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 1, wherein: the emotion image preprocessing in the step (1) specifically comprises the following processes:

3. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 1, wherein: the extraction of the emotional features in the step (2) specifically comprises the following steps:

<mrow> <mi>&mu;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

x′⁽ⁱ⁾＝x′⁽ⁱ⁾-μ (2)

<mrow> <mi>&Sigma;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>=</mo> <msup> <mi>U</mi> <mi>T</mi> </msup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mn>1</mn> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mn>2</mn> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mi>n</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mfrac> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </msubsup> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> </mrow> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> </mrow> </mfrac> <mo>&GreaterEqual;</mo> <mn>0.99</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

λ_jrepresenting the corresponding jth eigenvalue in the eigenvector U;

<mrow> <msup> <mover> <mi>x</mi> <mo>~</mo> </mover> <mo>&prime;</mo> </msup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&ap;</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>n</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>x</mi> <mrow> <mi>P</mi> <mi>C</mi> <mi>A</mi> <mi>w</mi> <mi>h</mi> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mo>,</mo> <mi>i</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>=</mo> <mfrac> <msubsup> <mover> <mi>x</mi> <mo>~</mo> </mover> <mi>i</mi> <mo>&prime;</mo> </msubsup> <msqrt> <mrow> <msub> <mi>&lambda;</mi> <mi>i</mi> </msub> <mo>+</mo> <mi>&epsiv;</mi> </mrow> </msqrt> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

x′_ZCAwhite＝Ux′_PCAwhite(8)

4. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 1, wherein: and (4) the label values 1-7 in the step (3) are respectively in one-to-one correspondence with 7 types of emotions, such as anger, aversion, fear, joy, neutrality, difficulty and surprise.

5. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 3, wherein: the emotion feature identification and classification in the step (3) specifically comprises the following processes:

a_i,k＝f(x_i*rot90(W_k,2)+b_k) (9)

DropoutTrain(x)＝RandomZero(p)×x (12)

<mrow> <msub> <mi>h</mi> <mi>&theta;</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mn>1</mn> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mn>2</mn> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>k</mi> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>j</mi> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </mfrac> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mn>1</mn> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mn>2</mn> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>k</mi> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>J</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <mo>&lsqb;</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mn>1</mn> <mo>{</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>}</mo> <mi>log</mi> <mfrac> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>j</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> </mrow> </msup> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>l</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> </mrow> </msup> </mrow> </mfrac> <mo>&rsqb;</mo> <mo>+</mo> <mfrac> <mi>&lambda;</mi> <mn>2</mn> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mi>&theta;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>2</mn> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>

the above formula is derived to obtain a gradient formula:

<mrow> <msub> <mo>&dtri;</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mo>&lsqb;</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>{</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>}</mo> <mo>-</mo> <mi>p</mi> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>)</mo> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>&lambda;&theta;</mi> <mi>j</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow>

^(l)＝((W^(l))^T ^(l+1))·f‘(z^(l)) (16)

the gradient calculation formula of the parameter W is:

<mrow> <msub> <mo>&dtri;</mo> <msup> <mi>W</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>,</mo> <mi>b</mi> <mo>;</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> <msup> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>17</mn> <mo>)</mo> </mrow> </mrow>

the gradient calculation formula of the parameter b is as follows:

<mrow> <msub> <mo>&dtri;</mo> <msup> <mi>b</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>,</mo> <mi>b</mi> <mo>;</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>18</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mi>u</mi> <mi>p</mi> <mi>s</mi> <mi>a</mi> <mi>m</mi> <mi>l</mi> <mi>e</mi> <mrow> <mo>(</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>W</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>19</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>x</mi> </mrow> </msup> <msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>x</mi> </mrow> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mfrac> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow>

v_t＝γv_t-1+α▽_θJ(θ-γv_t-1；x⁽ⁱ⁾,y⁽ⁱ⁾) (21)

θ＝θ-v_t(22)

wherein, ▽_θJ(θ；x⁽ⁱ⁾,y⁽ⁱ⁾) Is formed by training set (x)⁽ⁱ⁾,y⁽ⁱ⁾) The gradient of the parameter theta is obtained by calculation, α is the learning rate, v_tIs the current velocity vector, v_t-1Is the velocity vector in the previous iteration, α is initially set to 0.1, v_tInitially set to 0, and the same dimension as the parameter vector θ, γ ∈ (0, 1)]Setting gamma to 0.5 in the initial training stage, and increasing gamma to 0.95 after the first training iteration is finished;

Then will bea′_i,kSubstituting into formula (11) for a therein_i,kAcquiring a pooling characteristic map c' of the emotional image to be identified, namely high-level emotional characteristics;

DropoutTest(c′)＝(1-p)×c′ (23)