CN107506722A - One kind is based on depth sparse convolution neutral net face emotion identification method - Google Patents
One kind is based on depth sparse convolution neutral net face emotion identification method Download PDFInfo
- Publication number
- CN107506722A CN107506722A CN201710714001.6A CN201710714001A CN107506722A CN 107506722 A CN107506722 A CN 107506722A CN 201710714001 A CN201710714001 A CN 201710714001A CN 107506722 A CN107506722 A CN 107506722A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msup
- mtr
- mtd
- msubsup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 133
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000007935 neutral effect Effects 0.000 title abstract description 6
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 33
- 230000008909 emotion recognition Effects 0.000 claims abstract description 23
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 90
- 238000013528 artificial neural network Methods 0.000 claims description 38
- 239000011159 matrix material Substances 0.000 claims description 31
- 230000014509 gene expression Effects 0.000 claims description 29
- 238000011176 pooling Methods 0.000 claims description 29
- 238000010586 diagram Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 25
- 230000002996 emotional effect Effects 0.000 claims description 24
- 238000005070 sampling Methods 0.000 claims description 24
- 238000013527 convolutional neural network Methods 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 17
- 230000000694 effects Effects 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 11
- 230000001815 facial effect Effects 0.000 claims description 7
- 238000012935 Averaging Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 230000001131 transforming effect Effects 0.000 claims description 6
- 230000002087 whitening effect Effects 0.000 claims description 6
- 206010063659 Aversion Diseases 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000000644 propagated effect Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 description 10
- 230000003993 interaction Effects 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000008921 facial expression Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000001568 sexual effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 230000000474 nursing effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/175—Static expression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2136—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Neurology (AREA)
- Computational Linguistics (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention provides one kind to be based on depth sparse convolution neutral net face emotion identification method, first emotion image preprocessing, afterwards affective feature extraction, last affective characteristics identification classification.The present invention is a kind of to be optimized based on depth sparse convolution neutral net face emotion identification method from Nesterov accelerating gradients descent algorithm to the weights of depth sparse convolution neutral net, it is optimal network structure, to improve the generalization of face emotion recognition algorithm, because NAGD has anticipation, prevent to the property of can appreciate that algorithm is progressively too fast or too slow, the responding ability of algorithm can be strengthened simultaneously, and more preferable local optimum can be obtained.
Description
Technical Field
The invention relates to a face emotion recognition method based on a deep sparse convolution neural network, and belongs to the field of pattern recognition.
Background
In recent years, along with the development of various technologies, the degree of intelligence of the society is also continuously improved, and people are more and more eager to experience natural and harmonious human-computer interaction. However, emotion is always a gap that cannot be healed across human and machine. Therefore, breaking through the bottleneck of current emotion calculation is the key to the development of the artificial emotion field. The expression is one of important channels for human emotion expression, and the human face emotion recognition has certain application value in the fields of human-computer interaction, fatigue driving detection, remote nursing, pain assessment and the like, and has very wide application prospect. Therefore, more accurate expression recognition can be realized to promote the intelligent development of the society.
The face emotion recognition can be mainly divided into emotion feature extraction and emotion feature recognition classification. The face emotion recognition is still in the laboratory stage, and the expression of the opposite side cannot be naturally and smoothly recognized like a human in the human-computer interaction process. The existing face emotion recognition algorithms are difficult to accurately extract emotion characteristics, the complexity of the algorithms is high, the recognition time is long, and the real-time requirement in the human-computer interaction process cannot be met. Therefore, the method extracts the characteristics with obvious difference among the expressions, more accurately classifies the expressions with different expression forms, improves the algorithm efficiency, and is the key for realizing the human face emotion recognition.
Deep learning is a new field in machine learning research, and the motivation is to establish and simulate a neural network for human brain to analyze and learn, which simulates the mechanism of human brain to interpret data. The deep sparse convolutional neural network is a neural network composed of a convolutional neural network, Dropout and Softmax regression, and is one of deep learning models. The method introduces the randomized sparsity in the deep convolutional network through the Dropout layer, thereby improving the training efficiency of the network; and the network structures optimized by training each time are different, so that the optimization of the weight value does not depend on the combined action of neurons with fixed relations, the joint adaptability among the neurons is weakened, the method is similar to sexual propagation in natural selection, and meanwhile, the generalization capability of the network is improved. In deep learning, selection of an optimization algorithm is very important, in some previous researches, only setting of a network structure is usually considered, and a traditional gradient descent algorithm is easy to fall into a poor local optimal value, so that generalization performance of a neural network is poor.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a face emotion recognition method based on a deep sparse convolutional neural network, wherein a Nesterov Accelerated Gradient Descent (NAGD) algorithm is selected to optimize the weight of the deep sparse convolutional neural network, so that the network structure is optimal, and the generalization of the face emotion recognition algorithm is improved. The NAGD has a predictive capability to predictably prevent the algorithm from advancing too fast or too slow, while at the same time enhancing the response capability of the algorithm and obtaining a better local optimum.
The technical scheme adopted by the invention for solving the technical problem is as follows: the method for recognizing the human face emotion based on the deep sparse convolution neural network comprises the following steps of:
(1) preprocessing the emotion image: firstly, carrying out rotation correction and face cutting processing on an emotion image sample to be recognized, extracting an emotion characteristic key area, normalizing the image to be of a uniform size, and then carrying out histogram equalization on the emotion image to obtain a preprocessed emotion image;
(2) extracting emotional characteristics: firstly, extracting main component emotional characteristics of a preprocessed emotional image based on a PCA method to obtain characteristic data of different emotions; then, whitening the extracted feature data to obtain a PCA feature map of the emotion image to be recognized;
(3) and (3) emotion feature identification and classification: the method comprises the steps of constructing a deep sparse convolution neural network consisting of a convolution layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, firstly inputting a PCA characteristic diagram of a training set and a label value corresponding to emotion into the deep sparse convolution neural network, optimizing the deep sparse convolution neural network by adopting a Nesterov accelerated gradient descent algorithm, then inputting the PCA characteristic diagram of an emotion image to be recognized into the deep sparse convolution neural network, and outputting a recognition result, namely the label value corresponding to the emotion type.
The emotion image preprocessing in the step (1) specifically comprises the following processes:
(1-1) calibrating three feature points of two eyes and a nose tip in the emotion image to obtain coordinate values of the three feature points;
(1-2) rotating the emotion image according to the coordinate values of the left eye and the right eye, enabling the two eyes to be on the same horizontal line, setting the interval between the two eyes to be d, and setting the middle point of the interval to be O;
(1-3) cutting the face according to the facial features, taking d in the left and right direction of the horizontal direction and cutting 0.5d and 1.5d rectangular regions in the up and down direction of the vertical direction respectively by taking O as a reference, wherein the rectangular regions are face emotion sub-regions;
(1-4) transforming the scale of the face emotion subarea into a uniform size of 128 x 128 pixels;
and (1-5) carrying out histogram equalization on the emotion subarea of the human face to obtain a preprocessed emotion image.
The extraction of the emotional features in the step (2) specifically comprises the following steps:
(2-1) performing mean normalization by subtracting the average brightness value mu of the emotion image to make the characteristic mean values of data in the emotion image be around 0, specifically comprising the following steps:
the preprocessed emotion image data of size 128 × 128 is stored in a matrix 128 × 128, i.e. { x'(1),x′(2),…,x′(n)},x′(i)∈RnAnd n is 128, and each preprocessed emotion image is subjected to zero averaging by using an equation (1) and an equation (2):
x′(i)=x′(i)-μ (2)
(2-2) calculating a feature vector U of a covariance matrix sigma of the emotion image after zero averaging, wherein a calculation formula of sigma is as follows:
the emotion image pixel point value x' is calculated using the basis { U } of the feature vector U1,u2,…,unExpression:
(2-3) selection of x'rotThe first k principal components to retain 99% variance, i.e., the minimum value of k when equation (5) is chosen:
λjrepresenting the corresponding jth eigenvalue in the eigenvector U;
(2-4) mixing x'rotExcept k main components to be preserved, the rest are set to zeroIs x'rotIs approximately expressed, thenExpressed as:
(2-5) pairsScaling was performed to remove the correlation between the individual features, all with unit variance:
(2-6) transforming the feature vector U by adopting ZCA whitening to change the covariance matrix of the emotion image into a unit matrix I:
x′ZCAwhite=Ux′PCAwhite(8)
x'ZCAwhiteNamely the PCA characteristic diagram of the emotion image to be identified.
And (4) the label values 1-7 in the step (3) are respectively in one-to-one correspondence with 7 types of emotions, such as anger, aversion, fear, joy, neutrality, difficulty and surprise.
The emotion feature identification and classification in the step (3) specifically comprises the following processes:
(3-1) creating a deep sparse convolutional neural network sequentially consisting of a convolutional layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, and inputting training set data into the deep sparse convolutional neural network, wherein the training set data comprises a PCA characteristic diagram of a training set and label values corresponding to emotions, namely { (x)1,y1),...,(xm,ym) And y ism∈ {1, 2.., k }, where x isiFor the PCA profile of the training set, yiIs xiIteratively training a deep sparse convolution neural network by adopting a NAGD algorithm according to a corresponding emotion label value i ∈ {1, 2.., m }, wherein the iterative training comprises the following processes:
(3-1-1) randomly shuffling the training set data, grouping the data in the training set, wherein the number of the data in each group is consistent, and sequentially inputting each group into a deep sparse convolution neural network;
(3-1-2) each group of training set data firstly passes through a convolutional layer respectively, wherein the convolutional layer is provided with 100 convolutional kernels with the dimensionality of 29 multiplied by 29, and the moving step length of the convolutional kernels is 1; the deep sparse convolution neural network excavates local correlation information in a PCA characteristic diagram of a training set through convolution kernels, and the implementation process of the convolution layers is as follows:
ai,k=f(xi*rot90(Wk,2)+bk) (9)
wherein, ai,kThe ith PCA profile data x of the training set input via the kth convolution kernel in the convolutional layeriConvolution characteristic graph obtained by convolution processing, namely valid convolution operation, WkWeight representing the kth convolution, bkFor the deviation corresponding to the kth convolution kernel, f (-) is a Sigmoid-type activation function:
(3-1-3) inputting the convolution characteristic graph generated by the convolution layer into a sub-sampling layer, wherein the sub-sampling layer adopts average pooling, the average pooling dimension is set to be 4, the moving step length is 4, the size of the pooled characteristic graph obtained after the convolution characteristic graph passes through the sub-sampling layer is changed into one fourth of the original size, the number of the characteristic graphs is unchanged, and the average pooling adopts the following formula:
wherein, cjGenerating a jth pooling feature map for the sub-sampling layer, wherein p is an average pooling dimension;
(3-1-4) utilizing the Dropout layer to reduce the phenomenon of network overfitting, and randomly making all data passing through the Dropout layer, namely the pooling characteristic map generated in the step (3-1-3), not work, and keeping the data which does not work, wherein the calculation process is as follows:
DropoutTrain(x)=RandomZero(p)×x (12)
wherein DropotTracin (x) represents a data matrix obtained after a Dropout layer passes through a training stage, and RandomZero (p) represents that a value input into the data matrix x of the layer is set to be 0 with a set probability p;
(3-1-5) classifying and identifying rows of the data matrix obtained after passing through a Dropout layer by utilizing a Softmax regression layer:
(3-1-5-1) using the hypothesis function hθ(x) Calculating the probability value p (y is j | x), h of the data matrix obtained after passing through the Dropout layer and appearing in each expression category jθ(x) Is a k-dimensional vector, each vector element value corresponds to the probability value of the k categories respectively, and the vector elementsThe sum of the elements is 1, hθ(x) In the form of:
wherein theta is1,θ2,...,θk∈Rn+1The parameters of the model are obtained by random assignment in the initial training stage; x is the number of(i)Representing the ith pooling characteristic diagram data in the data matrix obtained after passing through the Dropout layer;
(3-1-5-2) the Softmax regression layer evaluates the classification effect using a cost function J (θ):
wherein 1{ y(i)J is an illustrative function whose value rule is 1{ expression whose value is true } ═ 1, such as 1{1+1 ═ 3} ═ 0, 1{1+1 ═ 2} ═ 1, y(i)Representing an emotion tag value;
the above formula is derived to obtain a gradient formula:
wherein λ represents a weight attenuation term in the formula (15)The factor (c) is a preset value;
(3-1-6) calculating residual errors of all layers and gradients of network parameters theta in a cost function J (W, b; x, y) in Softmax regression by using a back propagation algorithm, and specifically comprising the following steps of:
(3-1-6-1) if the l-th layer is fully connected to the l + 1-th layer, the residual calculation of the l-layer uses the following formula:
(l)=((W(l))T (l+1))·f‘(z(l)) (16)
the gradient calculation formula of the parameter W is:
the gradient calculation formula of the parameter b is as follows:
wherein,(l+1)is the residual error of the l +1 th layer in the network, J (W, b; x, y) is the cost function, (W, b) is the weight and the threshold parameter, and (x, y) is the training data and the label respectively;
(3-1-6-2) if the l-th layer is a convolutional layer and the l + 1-th layer is a sub-sampling layer, the residual is propagated by the following equation:
where k is the number of the convolution kernel,denotes xi*rot90(Wk,2)+bk,Is a Sigmoid-type partial derivative of the activation function, of the form:
(3-1-7) according to the calculated gradient of theta, NAGD uses a momentum term gamma vt-1To update the parameter theta by calculating theta-gamma vt-1To obtainTo an approximation of the future position of the parameter θ, the update formula for NAGD is:
θ=θ-vt(22)
wherein,is formed by training set (x)(i),y(i)) The gradient of the parameter theta is obtained by calculation, α is the learning rate, vtIs the current velocity vector, vt-1Is the velocity vector in the previous iteration, α is initially set to 0.1, vtInitially set to 0, and the same dimension as the parameter vector θ, γ ∈ (0, 1)]Setting gamma to 0.5 in the initial training stage, and increasing gamma to 0.95 after the first training iteration is finished;
(3-1-8) returning to the step (3-1-1) until the set iteration times are reached, and finishing the training optimization of the deep sparse convolution neural network;
(3-2) inputting the PCA feature map of the emotion image to be recognized into a deep sparse convolution neural network, and recognizing and classifying the feature map:
(3-2-1) PCA feature map of emotion image to be identified firstly passes through convolution layer and sub-sampling layer, and x'ZCAwhiteSubstituting input x in equation (9)iObtaining a convolution feature map a 'obtained by performing convolution processing on a PCA feature map of the emotion image to be identified input through the kth convolution kernel of the convolution layer'i,k;
Then a 'is prepared'i,kSubstituting into formula (11) for a thereini,kAcquiring a pooling characteristic map c' of the emotional image to be identified, namely high-level emotional characteristics;
(3-2-2) when the pooled feature map c 'of the emotion image to be recognized continues to pass through a Dropout layer, carrying out average processing on c':
DropoutTest(c′)=(1-p)×c′ (23)
dropottest (c ') represents a data matrix obtained after the pooling characteristic diagram c' of the emotion image to be identified continuously passes through a Dropout layer;
(3-2-3) hypothetical function h Using Softmax regression layerθ(x) And calculating the probability value of c' appearing in each expression category j, and outputting the category j corresponding to the obtained maximum probability value, namely outputting a classification result.
The invention has the beneficial effects based on the technical scheme that:
the invention introduces the randomized sparsity in the deep sparse convolution neural network through the Dropout layer, and the network structures optimized by each training are different, so that the optimization of the weight value does not depend on the combined action of neurons with fixed relation, the joint adaptability among the neurons is weakened, and the generalization capability and the training efficiency of the network are improved. Optimizing the weight of the deep convolutional neural network by adopting NAGD to optimize the network structure; compared with the traditional gradient descent algorithm, the NAGD has the capability of predicting, predictably preventing the algorithm from advancing too fast or too slow, simultaneously enhancing the response capability of the algorithm and obtaining a better local optimal value.
Drawings
FIG. 1 is a general flow diagram of the present invention.
FIG. 2 is a schematic diagram of emotion image preprocessing.
FIG. 3 is a face emotion feature image after feature extraction based on PCA.
FIG. 4 is a schematic diagram of a deep sparse convolutional neural network.
FIG. 5 JAFFE and CK + database partial image samples.
Fig. 6 is a line graph showing the influence of p on the recognition effect and training time in the Dropout layer.
Fig. 7 symmetrically transforms an image contrast map.
Fig. 8 experimental confusion matrix.
FIG. 9 is a topological structure diagram of a human-computer interaction system based on human face emotion recognition.
FIG. 10 GUI System debug interface.
Detailed Description
The invention is further illustrated by the following figures and examples.
The invention provides a human face emotion recognition method based on a deep sparse convolution neural network, and a general flow diagram of the method is shown in figure 1. Firstly, carrying out image preprocessing on an emotion image sample, namely correcting and cutting the direction of a human face, and carrying out histogram equalization on the emotion image sample; then extracting bottom layer emotional characteristics based on PCA; and finally, mining and learning high-level emotional features by using the constructed deep sparse convolutional neural network, identifying and classifying the high-level emotional features, and training and optimizing the network weight by using the NAGD so as to optimize the whole network structure and improve the human face emotion identification performance.
The method for recognizing the human face emotion based on the deep sparse convolution neural network can be mainly divided into three parts, namely emotion image preprocessing, emotion feature extraction and emotion feature recognition classification, and the realization process comprises the following steps:
(1) preprocessing the emotion image: as shown in fig. 2, firstly, performing rotation correction and face clipping on an emotion image sample to be recognized, extracting an emotion feature key region, normalizing the image to a uniform size, and then performing histogram equalization on the emotion image to obtain a preprocessed emotion image; the method specifically comprises the following steps:
(1-1) manually calibrating three feature points of two eyes and a nose tip in the emotion image by using a function [ x, y ] ═ ginput (3), and obtaining coordinate values of the three feature points;
(1-2) rotating the emotion image according to the coordinate values of the left eye and the right eye, enabling the two eyes to be on the same horizontal line, setting the interval between the two eyes to be d, and setting the middle point of the interval to be O;
(1-3) cutting the face according to the facial features, taking d in the left and right direction of the horizontal direction and cutting 0.5d and 1.5d rectangular regions in the up and down direction of the vertical direction respectively by taking O as a reference, wherein the rectangular regions are face emotion sub-regions;
(1-4) transforming the scale of the face emotion subarea into a uniform size of 128 x 128 pixels;
and (1-5) carrying out histogram equalization on the emotion subarea of the human face to obtain a preprocessed emotion image.
(2) Extracting emotional characteristics: firstly, extracting main component emotional characteristics of a preprocessed emotional image based on a PCA method to obtain characteristic data which are different among different emotions and easy to process; and whitening the extracted feature data to obtain a PCA feature map of the emotion image to be recognized. The obtained face emotion image after extracting emotion features based on PCA is shown in FIG. 3, and specifically comprises the following steps:
(2-1) performing mean normalization by subtracting the average brightness value mu of the emotion image to make the characteristic mean values of data in the emotion image be around 0, specifically comprising the following steps:
the preprocessed emotion image data of size 128 × 128 is stored in a matrix 128 × 128, i.e. { x'(1),x′(2),…,x′(n)},x′(i)∈RnAnd n is 128, and each preprocessed emotion image is subjected to zero averaging by using an equation (1) and an equation (2):
x′(i)=x′(i)-μ (2)
(2-2) calculating a feature vector U of a covariance matrix sigma of the emotion image after zero averaging, wherein a calculation formula of sigma is as follows:
the emotion image pixel point value x' is calculated using the basis { U } of the feature vector U1,u2,…,unExpression:
(2-3) selection of x'rotThe first k principal components to retain 99% variance, i.e., the minimum value of k when equation (5) is chosen:
λjrepresenting the corresponding jth eigenvalue in the eigenvector U;
(2-4) mixing x'rotExcept k main components to be preserved, the rest are set to zeroIs x'rotIs approximately expressed, thenExpressed as:
(2-5) pairsScaling was performed to remove the correlation between the individual features, all with unit variance:
(2-6) transforming the feature vector U by adopting ZCA whitening to change the covariance matrix of the emotion image into a unit matrix I:
x′ZCAwhite=Ux′PCAwhite(8)
x'ZCAwhiteNamely the PCA characteristic diagram of the emotion image to be identified.
(3) And (3) emotion feature identification and classification: and constructing a deep sparse convolutional neural network which consists of a convolutional layer, a sub-sampling layer (pooling layer), a Dropout layer and a Softmax regression layer and is shown in figure 4, wherein the convolutional layer, the pooling layer and the Dropout layer are used for mining and learning high-level emotional features, and the Softmax regression layer is used for identifying and classifying the learned emotional features and outputting classification results, namely label values corresponding to the emotional categories. The label values of 1-7 correspond to 7 types of emotions, namely anger, aversion, fear, joy, neutrality, difficulty and surprise one to one respectively.
Firstly, inputting PCA characteristic graphs of a training set and label values corresponding to emotions into a deep sparse convolutional neural network, optimizing the deep sparse convolutional neural network by adopting a Nesterov accelerated gradient descent algorithm to optimize a network structure so as to improve the generalization of a face emotion recognition algorithm, and after network training is finished, storing an optimal weight of the network to obtain the optimized deep sparse convolutional neural network. And then, in a testing stage, inputting a testing set, namely a PCA characteristic diagram of the emotion image to be recognized, into the deep sparse convolution neural network, and outputting a recognition result, namely a label value corresponding to the emotion category. The method specifically comprises the following steps:
(3-1) creating a deep sparse convolution neural network sequentially consisting of a convolution layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, and inputting training set data into the deep sparse convolution neural network, wherein the training set data comprisesPCA feature maps of the training set and tag values for corresponding emotions, i.e., { (x)1,y1),...,(xm,ym) And y ism∈ {1, 2.., k }, where x isiFor the PCA profile of the training set, yiIs xiIteratively training a deep sparse convolution neural network by adopting a NAGD algorithm according to a corresponding emotion label value i ∈ {1, 2.., m }, wherein the iterative training comprises the following processes:
(3-1-1) randomly shuffling the training set data, grouping the data in the training set, wherein the number of the data in each group is consistent, and sequentially inputting each group into a deep sparse convolution neural network;
(3-1-2) each group of training set data firstly passes through a convolutional layer respectively, wherein the convolutional layer is provided with 100 convolutional kernels with the dimensionality of 29 multiplied by 29, and the moving step length of the convolutional kernels is 1; the deep sparse convolution neural network excavates local correlation information in a PCA characteristic diagram of a training set through convolution kernels, and the implementation process of the convolution layers is as follows:
ai,k=f(xi*rot90(Wk,2)+bk) (9)
wherein, ai,kThe ith PCA profile data x of the training set input via the kth convolution kernel in the convolutional layeriConvolution characteristic graph obtained by convolution processing, namely valid convolution operation, WkWeight representing the kth convolution, bkFor the deviation corresponding to the kth convolution kernel, f (-) is a Sigmoid-type activation function:
(3-1-3) inputting the convolution characteristic graph generated by the convolution layer into a sub-sampling layer, wherein the sub-sampling layer adopts average pooling, the average pooling dimension is set to be 4, the moving step length is 4, the size of the pooled characteristic graph obtained after the convolution characteristic graph passes through the sub-sampling layer is changed into one fourth of the original size, the number of the characteristic graphs is unchanged, and the average pooling adopts the following formula:
wherein, cjGenerating a jth pooling feature map for the sub-sampling layer, wherein p is an average pooling dimension;
(3-1-4) utilizing the Dropout layer to reduce the phenomenon of network overfitting, and randomly making all data passing through the Dropout layer, namely the pooling characteristic map generated in the step (3-1-3), not work, and keeping the data which does not work, wherein the calculation process is as follows:
DropoutTrain(x)=RandomZero(p)×x (12)
wherein DropotTracin (x) represents a data matrix obtained after a Dropout layer passes through a training stage, and RandomZero (p) represents that a value input into the data matrix x of the layer is set to be 0 with a set probability p;
(3-1-5) classifying and identifying the input data by utilizing a Softmax regression layer:
(3-1-5-1) using the hypothesis function hθ(x) Calculating the probability value p (y is j | x), h of the data matrix obtained after passing through the Dropout layer and appearing in each expression category jθ(x) Is a k-dimensional vector, each vector element value corresponds to the probability value of the k classes, respectively, and the sum of the vector elements is 1, hθ(x) In the form of:
wherein theta is1,θ2,...,θk∈Rn+1The parameters of the model are obtained by random assignment in the initial training stage; x is the number of(i)Representing the ith pooling characteristic diagram data in the data matrix obtained after passing through the Dropout layer;
(3-1-5-2) the Softmax regression layer evaluates the classification effect using a cost function J (θ):
wherein 1{ y(i)J is an illustrative function whose value rule is 1{ expression whose value is true } ═ 1, such as 1{1+1 ═ 3} ═ 0, 1{1+1 ═ 2} ═ 1, y(i)Representing an emotion tag value;
the above formula is derived to obtain a gradient formula:
wherein λ represents a weight attenuation term in the formula (15)The factor (c) is a preset value;
(3-1-6) calculating the residual error of each layer and the gradient of each network parameter theta in a cost function J (W, b; x, y) in Softmax regression by using a reverse conduction algorithm, and specifically comprising the following steps of:
(3-1-6-1) if the l-th layer is fully connected to the l + 1-th layer, the residual calculation of the l-layer uses the following formula:
(l)=((W(l))T (l+1))·f‘(z(l)) (16)
the gradient calculation formula of the parameter W is:
the gradient calculation formula of the parameter b is as follows:
wherein,(l+1)is the residual error of the l +1 th layer in the network, J (W, b; x, y) is the cost function, (W, b) is the weight and the threshold parameter, and (x, y) is the training data and the label respectively;
(3-1-6-2) if the l-th layer is a convolutional layer and the l + 1-th layer is a sub-sampling layer, the residual is propagated by the following equation:
where k is the number of the convolution kernel,denotes xi*rot90(Wk,2)+bk,Is a Sigmoid-type partial derivative of the activation function, of the form:
(3-1-7) according to the calculated gradient of theta, NAGD uses a momentum term gamma vt-1To update the parameter theta by calculating theta-gamma vt-1To obtain an approximation of the future position of the parameter θ, the update formula for NAGD is:
θ=θ-vt(22)
wherein,is formed by training set (x)(i),y(i)) The gradient of the parameter theta is obtained by calculation, α is the learning rate, vtIs the current velocity vector, vt-1Is the velocity vector in the previous iteration, αInitial setting to 0.1, vtInitially set to 0, and the same dimension as the parameter vector θ, γ ∈ (0, 1)]Setting gamma to 0.5 in the initial training stage, and increasing gamma to 0.95 after the first training iteration is finished;
(3-1-8) returning to the step (3-1-1), knowing the set iteration times, and completing the training optimization of the deep sparse convolution neural network;
(3-2) inputting the PCA feature map of the emotion image to be recognized into a deep sparse convolution neural network, and recognizing and classifying the feature map:
(3-2-1) PCA feature map of emotion image to be identified firstly passes through convolution layer and sub-sampling layer, and x'ZCAwhiteSubstituting input x in equation (9)iObtaining a convolution feature map a 'obtained by performing convolution processing on a PCA feature map of the emotion image to be identified input through the kth convolution kernel of the convolution layer'i,k;
Then a 'is prepared'i,kSubstituting into formula (11) for a thereini,kAcquiring a pooling characteristic map c' of the emotional image to be identified, namely high-level emotional characteristics;
(3-2-2) when the pooled feature map c 'of the emotion image to be recognized continues to pass through a Dropout layer, carrying out average processing on c':
DropoutTest(c′)=(1-p)×c′ (23)
dropottest (c ') represents a data matrix obtained after the pooling characteristic diagram c' of the emotion image to be identified continuously passes through a Dropout layer;
(3-2-3) hypothetical function h Using Softmax regression layerθ(x) And calculating the probability value of c' appearing in each expression category j, and outputting the category j corresponding to the obtained maximum probability value, namely outputting a classification result.
The face emotion database used in the experiment by using the method is JAFFE and CK + databases, and partial sample images of the databases are shown in FIG. 5, wherein the first row is JAFFE database samples, and the second row is CK + database samples. The JAFFE database is 213 gray images composed of 7 basic expressions of 10 women, the size of the image is 256 × 256, and each expression image of each person has 2 to 4 images. The CK + database consists of 210 adults of different ethnic and sexual classes, 18 to 50 years old, and comprises 326 labeled expression image sequences, wherein each image has a size of 640 x 490 and comprises 7 expressions, namely anger, disgust, fear, joy, difficulty, surprise and slight. Taking the expression in a calm state as a neutral expression, and combining seven basic expressions with six expression peak value image frames except for the slight to form 399 expressions.
80% of the JAFFE facial expression database was used as a training sample and 20% as a test sample. Fig. 6 shows a graph obtained by changing the size of the p value in the Dropout layer, and it can be seen that the training time gradually shortens and the recognition rate tends to increase as the p value increases. This indicates that when training a deep sparse convolutional neural network, selecting an appropriate p-value in the Dropout layer is beneficial to improve the generalization performance of the network and shorten the required training time. The influence of p on the training time and the recognition rate is comprehensively considered, and p is 0.5 as an optimal value, so that the time required by network training can be effectively reduced, the training efficiency is greatly improved, the network performance is also improved, and a good recognition effect can be obtained.
One of the problems common to deep learning algorithms is that they require a large amount of data to learn during the training phase. However, the amount of data available in some existing public databases is not sufficient to satisfy the data required by the deep learning algorithm. Thus, to increase the number of training samples without having some duplicate samples, all the original samples are symmetrically transformed to double the number of database samples, and the symmetrically transformed image contrast map is shown in fig. 7. To verify the effectiveness of the added samples, the control variable experiments were set as follows: 80% of the JAFFE facial expression database was used as a training sample and 20% as a test sample. Keeping the parameters of the algorithm unchanged, and training the proposed deep sparse convolution neural network by using two training sets, wherein the two training sets respectively consist of an original image and an image added with symmetric transformation, and the two experimental use test sets are the same. Since the Dropout layer has a significant effect on the recognition effect, in order to highlight the effect of increasing the sample on the recognition effect, in the experiment, p is set to be 0, the Dropout layer is shielded, and the NAGD is adopted to optimize the network, and the experiment result is shown in table 1:
TABLE 1 comparison of the results
Table 2 shows the emotion recognition effect obtained by training the deep sparse convolution neural network with the conventional Momentum-based random gradient Descent (MSGD) and NAGD algorithms. The experimental database adopts a JAFFE database, symmetric transformation images are added to training samples, 1 is taken, and 0.5 is taken as p. The experimental result shows that the experimental result obtained by training the network by using the NAGD is more stable and the recognition effect is better than the experimental result obtained by training the network by using the MSGD.
TABLE 2 NAGD and MSGD test results
In order to verify the effectiveness of the algorithm provided by the invention, experiments are respectively carried out in JAFFE and CK + databases. 80% of the JAFFE facial expression database was used as a training sample and 20% as a test sample. The CK + database has wider ranges of ages, sexes and races than the JAFFE database, and in order to better learn various emotional characteristics of various people, the CK + database uses a training set with a larger proportion than JAFFE, namely 90% of images in the database are selected as the training set, and the rest 10% of images are selected as the test set. The experimental results obtained by adding a symmetrically transformed image to each picture in the training set, taking 1 and p as 0.5 are shown in table 3.
TABLE 3 identification results obtained on JAFFE and CK + databases
As can be seen from table 3, the proposed algorithm achieved good recognition in both the JAFFE and CK + databases, with a recognition rate of 97.62% for JAFFE and 95.12% for CK +. The training time per image was 0.6757 seconds on average and the recognition time per image was 0.1258 seconds on average, based on the training time and recognition time found in the table, divided by the number of images in the training set/test set. The recognition rate and misclassification of each type of expression for the two experiments are shown in the confusion matrix in fig. 8, in which AN., DI., FE., NE., sa, and SU. correspond to seven basic expressions, angry, aversive, fear, happy, neutral, difficult, and surprised, respectively.
The invention builds a set of human-computer interaction system based on a human face emotion recognition algorithm, the human-computer interaction system mainly comprises a wheeled robot, an emotion calculation workstation, a router, data transmission equipment and the like, and a topological structure diagram is shown in figure 9. The system firstly acquires human face emotion image frame data through a Kinect configured on the wheeled robot, then transmits the data to an emotion calculation workstation, the workstation inputs the data to a trained human face emotion recognition system for recognition, and finally the workstation feeds back a recognition result to the wheeled robot, so that the wheeled robot can realize natural and harmonious interaction with a human.
A GUI interface is built on a system debugging interface of the human-computer interaction system for debugging through MATLAB 2016a, and a schematic diagram of the GUI interface is shown in FIG. 10. In a GUI system debugging interface, clicking an image preview button, calling a Kinect color camera by the system, and displaying a captured image on an image window on the left side of the GUI interface in real time; clicking an emotion recognition button, acquiring a currently captured image and displaying the currently captured image on an image window on the right side of a GUI interface, then manually acquiring coordinates of two eyes and a nose tip so as to correct and cut a face, and inputting the cut face image into a trained deep convolutional neural network for face emotion recognition; and feeding back the final recognition result to the GUI interface and displaying the final recognition result.
And 2 groups of 7 basic expression image frames of 3 individuals are collected as a training set and input into a deep convolutional neural network for training, and then the image frames captured by the Kinect are input into the trained network for recognition. Table 4 shows the online recognition results of the 7 basic expressions for the 3 persons,
table 4 application test results
As can be seen from the table, the average recognition rate of the three groups of experiments is 76.190%, which shows the prospect of the invention in practical application.
Claims (5)
1. A face emotion recognition method based on a deep sparse convolution neural network is characterized by comprising the following steps:
(1) preprocessing the emotion image: firstly, carrying out rotation correction and face cutting processing on an emotion image sample to be recognized, extracting an emotion characteristic key area, normalizing the image to be of a uniform size, and then carrying out histogram equalization on the emotion image to obtain a preprocessed emotion image;
(2) extracting emotional characteristics: firstly, extracting main component emotional characteristics of a preprocessed emotional image based on a PCA method to obtain characteristic data of different emotions; then, whitening the extracted feature data to obtain a PCA feature map of the emotion image to be recognized;
(3) and (3) emotion feature identification and classification: the method comprises the steps of constructing a deep sparse convolution neural network consisting of a convolution layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, firstly inputting a PCA characteristic diagram of a training set and a label value corresponding to emotion into the deep sparse convolution neural network, optimizing the deep sparse convolution neural network by adopting a Nesterov accelerated gradient descent algorithm, then inputting the PCA characteristic diagram of an emotion image to be recognized into the deep sparse convolution neural network, and outputting a recognition result, namely the label value corresponding to the emotion type.
2. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 1, wherein: the emotion image preprocessing in the step (1) specifically comprises the following processes:
(1-1) calibrating three feature points of two eyes and a nose tip in the emotion image to obtain coordinate values of the three feature points;
(1-2) rotating the emotion image according to the coordinate values of the left eye and the right eye, enabling the two eyes to be on the same horizontal line, setting the interval between the two eyes to be d, and setting the middle point of the interval to be O;
(1-3) cutting the face according to the facial features, taking d in the left and right direction of the horizontal direction and cutting 0.5d and 1.5d rectangular regions in the up and down direction of the vertical direction respectively by taking O as a reference, wherein the rectangular regions are face emotion sub-regions;
(1-4) transforming the scale of the face emotion subarea into a uniform size of 128 x 128 pixels;
and (1-5) carrying out histogram equalization on the emotion subarea of the human face to obtain a preprocessed emotion image.
3. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 1, wherein: the extraction of the emotional features in the step (2) specifically comprises the following steps:
(2-1) performing mean normalization by subtracting the average brightness value mu of the emotion image to make the characteristic mean values of data in the emotion image be around 0, specifically comprising the following steps:
the preprocessed emotion image data of size 128 × 128 is stored in a matrix 128 × 128, i.e. { x'(1),x′(2),…,x′(n)},x′(i)∈RnAnd n is 128, and each preprocessed emotion image is subjected to zero averaging by using an equation (1) and an equation (2):
<mrow> <mi>&mu;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
x′(i)=x′(i)-μ (2)
(2-2) calculating a feature vector U of a covariance matrix sigma of the emotion image after zero averaging, wherein a calculation formula of sigma is as follows:
<mrow> <mi>&Sigma;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
the emotion image pixel point value x' is calculated using the basis { U } of the feature vector U1,u2,…,unExpression:
<mrow> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>=</mo> <msup> <mi>U</mi> <mi>T</mi> </msup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mn>1</mn> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mn>2</mn> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>u</mi> <mi>n</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
(2-3) selection of x'rotThe first k principal components to retain 99% variance, i.e., the minimum value of k when equation (5) is chosen:
<mrow> <mfrac> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </msubsup> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> </mrow> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>&lambda;</mi> <mi>j</mi> </msub> </mrow> </mfrac> <mo>&GreaterEqual;</mo> <mn>0.99</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
λjrepresenting the corresponding jth eigenvalue in the eigenvector U;
(2-4) mixing x'rotExcept k main components to be preserved, the rest are set to zeroIs x'rotIs approximately expressed, thenExpressed as:
<mrow> <msup> <mover> <mi>x</mi> <mo>~</mo> </mover> <mo>&prime;</mo> </msup> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> </mtr> </mtable> </mfenced> <mo>&ap;</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>n</mi> </mrow> <mo>&prime;</mo> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <msubsup> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>
(2-5) pairsScaling was performed to remove the correlation between the individual features, all with unit variance:
<mrow> <msubsup> <mi>x</mi> <mrow> <mi>P</mi> <mi>C</mi> <mi>A</mi> <mi>w</mi> <mi>h</mi> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mo>,</mo> <mi>i</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>=</mo> <mfrac> <msubsup> <mover> <mi>x</mi> <mo>~</mo> </mover> <mi>i</mi> <mo>&prime;</mo> </msubsup> <msqrt> <mrow> <msub> <mi>&lambda;</mi> <mi>i</mi> </msub> <mo>+</mo> <mi>&epsiv;</mi> </mrow> </msqrt> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>
(2-6) transforming the feature vector U by adopting ZCA whitening to change the covariance matrix of the emotion image into a unit matrix I:
x′ZCAwhite=Ux′PCAwhite(8)
x'ZCAwhiteNamely the PCA characteristic diagram of the emotion image to be identified.
4. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 1, wherein: and (4) the label values 1-7 in the step (3) are respectively in one-to-one correspondence with 7 types of emotions, such as anger, aversion, fear, joy, neutrality, difficulty and surprise.
5. The facial emotion recognition method based on the deep sparse convolutional neural network of claim 3, wherein: the emotion feature identification and classification in the step (3) specifically comprises the following processes:
(3-1) creating a deep sparse convolutional neural network sequentially consisting of a convolutional layer, a sub-sampling layer, a Dropout layer and a Softmax regression layer, and inputting training set data into the deep sparse convolutional neural network, wherein the training set data comprises a PCA characteristic diagram of a training set and label values corresponding to emotions, namely { (x)1,y1),...,(xm,ym) And y ism∈ {1, 2.., k }, where x isiFor the PCA profile of the training set, yiIs xiIteratively training a deep sparse convolution neural network by adopting a NAGD algorithm according to a corresponding emotion label value i ∈ {1, 2.., m }, wherein the iterative training comprises the following processes:
(3-1-1) randomly shuffling the training set data, grouping the data in the training set, wherein the number of the data in each group is consistent, and sequentially inputting each group into a deep sparse convolution neural network;
(3-1-2) each group of training set data firstly passes through a convolutional layer respectively, wherein the convolutional layer is provided with 100 convolutional kernels with the dimensionality of 29 multiplied by 29, and the moving step length of the convolutional kernels is 1; the deep sparse convolution neural network excavates local correlation information in a PCA characteristic diagram of a training set through convolution kernels, and the implementation process of the convolution layers is as follows:
ai,k=f(xi*rot90(Wk,2)+bk) (9)
wherein, ai,kThe ith PCA profile data x of the training set input via the kth convolution kernel in the convolutional layeriConvolution characteristic graph obtained by convolution processing, namely valid convolution operation, WkWeight representing the kth convolution, bkFor the deviation corresponding to the kth convolution kernel, f (-) is a Sigmoid-type activation function:
<mrow> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>x</mi> </mrow> </msup> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow>
(3-1-3) inputting the convolution characteristic graph generated by the convolution layer into a sub-sampling layer, wherein the sub-sampling layer adopts average pooling, the average pooling dimension is set to be 4, the moving step length is 4, the size of the pooled characteristic graph obtained after the convolution characteristic graph passes through the sub-sampling layer is changed into one fourth of the original size, the number of the characteristic graphs is unchanged, and the average pooling adopts the following formula:
<mrow> <msub> <mi>c</mi> <mi>j</mi> </msub> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mo>.</mo> <mi>k</mi> </mrow> </msub> <mo>*</mo> <mfrac> <mn>1</mn> <msup> <mi>p</mi> <mn>2</mn> </msup> </mfrac> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>
wherein, cjGenerating a jth pooling feature map for the sub-sampling layer, wherein p is an average pooling dimension;
(3-1-4) utilizing the Dropout layer to reduce the phenomenon of network overfitting, and randomly making all data passing through the Dropout layer, namely the pooling characteristic map generated in the step (3-1-3), not work, and keeping the data which does not work, wherein the calculation process is as follows:
DropoutTrain(x)=RandomZero(p)×x (12)
wherein DropotTracin (x) represents a data matrix obtained after a Dropout layer passes through a training stage, and RandomZero (p) represents that a value input into the data matrix x of the layer is set to be 0 with a set probability p;
(3-1-5) classifying and identifying rows of the data matrix obtained after passing through a Dropout layer by utilizing a Softmax regression layer:
(3-1-5-1) using the hypothesis function hθ(x) Calculating the probability value p (y is j | x), h of the data matrix obtained after passing through the Dropout layer and appearing in each expression category jθ(x) Is a k-dimensional vector, each vector element value corresponds to the probability value of the k classes, respectively, and the sum of the vector elements is 1, hθ(x) In the form of:
<mrow> <msub> <mi>h</mi> <mi>&theta;</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mn>1</mn> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mn>2</mn> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>k</mi> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>j</mi> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mrow> </mfrac> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mn>1</mn> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mn>2</mn> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>k</mi> <mi>T</mi> </msubsup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msup> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>
wherein theta is1,θ2,...,θk∈Rn+1The parameters of the model are obtained by random assignment in the initial training stage; x is the number of(i)Representing the ith pooling characteristic diagram data in the data matrix obtained after passing through the Dropout layer;
(3-1-5-2) the Softmax regression layer evaluates the classification effect using a cost function J (θ):
<mrow> <mi>J</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <mo>&lsqb;</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mn>1</mn> <mo>{</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>}</mo> <mi>log</mi> <mfrac> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>j</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> </mrow> </msup> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <msup> <mi>e</mi> <mrow> <msubsup> <mi>&theta;</mi> <mi>l</mi> <mi>T</mi> </msubsup> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> </mrow> </msup> </mrow> </mfrac> <mo>&rsqb;</mo> <mo>+</mo> <mfrac> <mi>&lambda;</mi> <mn>2</mn> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mi>&theta;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>2</mn> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>
wherein 1{ y(i)J is an illustrative function whose value rule is 1{ expression whose value is true } ═ 1, such as 1{1+1 ═ 3} ═ 0, 1{1+1 ═ 2} ═ 1, y(i)Representing an emotion tag value;
the above formula is derived to obtain a gradient formula:
<mrow> <msub> <mo>&dtri;</mo> <msub> <mi>&theta;</mi> <mi>j</mi> </msub> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mo>&lsqb;</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>{</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>}</mo> <mo>-</mo> <mi>p</mi> <mo>(</mo> <msup> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>=</mo> <mi>j</mi> <mo>|</mo> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msup> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>)</mo> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>&lambda;&theta;</mi> <mi>j</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow>
wherein λ represents a weight attenuation term in the formula (15)The factor (c) is a preset value;
(3-1-6) calculating residual errors of all layers and gradients of network parameters theta in a cost function J (W, b; x, y) in Softmax regression by using a back propagation algorithm, and specifically comprising the following steps of:
(3-1-6-1) if the l-th layer is fully connected to the l + 1-th layer, the residual calculation of the l-layer uses the following formula:
(l)=((W(l))T (l+1))·f‘(z(l)) (16)
the gradient calculation formula of the parameter W is:
<mrow> <msub> <mo>&dtri;</mo> <msup> <mi>W</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>,</mo> <mi>b</mi> <mo>;</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> <msup> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>17</mn> <mo>)</mo> </mrow> </mrow>
the gradient calculation formula of the parameter b is as follows:
<mrow> <msub> <mo>&dtri;</mo> <msup> <mi>b</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>,</mo> <mi>b</mi> <mo>;</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>&delta;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>18</mn> <mo>)</mo> </mrow> </mrow>
wherein,(l+1)is the residual error of the l +1 th layer in the network, J (W, b; x, y) is the cost function, (W, b) is the weight and the threshold parameter, and (x, y) is the training data and the label respectively;
(3-1-6-2) if the l-th layer is a convolutional layer and the l + 1-th layer is a sub-sampling layer, the residual is propagated by the following equation:
<mrow> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mi>u</mi> <mi>p</mi> <mi>s</mi> <mi>a</mi> <mi>m</mi> <mi>l</mi> <mi>e</mi> <mrow> <mo>(</mo> <msup> <mrow> <mo>(</mo> <msubsup> <mi>W</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <msubsup> <mi>z</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>19</mn> <mo>)</mo> </mrow> </mrow>
where k is the number of the convolution kernel,denotes xi*rot90(Wk,2)+bk,Is a Sigmoid-type partial derivative of the activation function, of the form:
<mrow> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>x</mi> </mrow> </msup> <msup> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>x</mi> </mrow> </msup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mfrac> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>f</mi> <mo>(</mo> <mi>x</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow>
(3-1-7) according to the calculated gradient of theta, NAGD uses a momentum term gamma vt-1To update the parameter theta by calculating theta-gamma vt-1To obtain an approximation of the future position of the parameter θ, the update formula for NAGD is:
vt=γvt-1+α▽θJ(θ-γvt-1;x(i),y(i)) (21)
θ=θ-vt(22)
wherein, ▽θJ(θ;x(i),y(i)) Is formed by training set (x)(i),y(i)) The gradient of the parameter theta is obtained by calculation, α is the learning rate, vtIs the current velocity vector, vt-1Is the velocity vector in the previous iteration, α is initially set to 0.1, vtInitially set to 0, and the same dimension as the parameter vector θ, γ ∈ (0, 1)]Setting gamma to 0.5 in the initial training stage, and increasing gamma to 0.95 after the first training iteration is finished;
(3-1-8) returning to the step (3-1-1) until the set iteration times are reached, and finishing the training optimization of the deep sparse convolution neural network;
(3-2) inputting the PCA feature map of the emotion image to be recognized into a deep sparse convolution neural network, and recognizing and classifying the feature map:
(3-2-1) PCA feature map of emotion image to be identified firstly passes through convolution layer and sub-sampling layer, and x'ZCAwhiteSubstituting input x in equation (9)iObtaining a convolution feature map a 'obtained by performing convolution processing on a PCA feature map of the emotion image to be identified input through the kth convolution kernel of the convolution layer'i,k;
Then will bea′i,kSubstituting into formula (11) for a thereini,kAcquiring a pooling characteristic map c' of the emotional image to be identified, namely high-level emotional characteristics;
(3-2-2) when the pooled feature map c 'of the emotion image to be recognized continues to pass through a Dropout layer, carrying out average processing on c':
DropoutTest(c′)=(1-p)×c′ (23)
dropottest (c ') represents a data matrix obtained after the pooling characteristic diagram c' of the emotion image to be identified continuously passes through a Dropout layer;
(3-2-3) hypothetical function h Using Softmax regression layerθ(x) And calculating the probability value of c' appearing in each expression category j, and outputting the category j corresponding to the obtained maximum probability value, namely outputting a classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710714001.6A CN107506722A (en) | 2017-08-18 | 2017-08-18 | One kind is based on depth sparse convolution neutral net face emotion identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710714001.6A CN107506722A (en) | 2017-08-18 | 2017-08-18 | One kind is based on depth sparse convolution neutral net face emotion identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107506722A true CN107506722A (en) | 2017-12-22 |
Family
ID=60692255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710714001.6A Pending CN107506722A (en) | 2017-08-18 | 2017-08-18 | One kind is based on depth sparse convolution neutral net face emotion identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107506722A (en) |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154504A (en) * | 2017-12-25 | 2018-06-12 | 浙江工业大学 | Method for detecting surface defects of steel plate based on convolutional neural network |
CN108182260A (en) * | 2018-01-03 | 2018-06-19 | 华南理工大学 | A kind of Multivariate Time Series sorting technique based on semantic selection |
CN108460329A (en) * | 2018-01-15 | 2018-08-28 | 任俊芬 | A kind of face gesture cooperation verification method based on deep learning detection |
CN108491835A (en) * | 2018-06-12 | 2018-09-04 | 常州大学 | Binary channels convolutional neural networks towards human facial expression recognition |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108614875A (en) * | 2018-04-26 | 2018-10-02 | 北京邮电大学 | Chinese emotion tendency sorting technique based on global average pond convolutional neural networks |
CN108711150A (en) * | 2018-05-22 | 2018-10-26 | 电子科技大学 | A kind of end-to-end pavement crack detection recognition method based on PCA |
CN108764128A (en) * | 2018-05-25 | 2018-11-06 | 华中科技大学 | A kind of video actions recognition methods based on sparse time slice network |
CN108806667A (en) * | 2018-05-29 | 2018-11-13 | 重庆大学 | The method for synchronously recognizing of voice and mood based on neural network |
CN108846380A (en) * | 2018-04-09 | 2018-11-20 | 北京理工大学 | A kind of facial expression recognizing method based on cost-sensitive convolutional neural networks |
CN108875904A (en) * | 2018-04-04 | 2018-11-23 | 北京迈格威科技有限公司 | Image processing method, image processing apparatus and computer readable storage medium |
CN108898105A (en) * | 2018-06-29 | 2018-11-27 | 成都大学 | It is a kind of based on depth characteristic and it is sparse compression classification face identification method |
CN108985457A (en) * | 2018-08-22 | 2018-12-11 | 北京大学 | A kind of deep neural network construction design method inspired by optimization algorithm |
CN109033994A (en) * | 2018-07-03 | 2018-12-18 | 辽宁工程技术大学 | A kind of facial expression recognizing method based on convolutional neural networks |
CN109409219A (en) * | 2018-09-19 | 2019-03-01 | 湖北工业大学 | Indoor occupant locating and tracking algorithm based on depth convolutional network |
CN109635790A (en) * | 2019-01-28 | 2019-04-16 | 杭州电子科技大学 | A kind of pedestrian's abnormal behaviour recognition methods based on 3D convolution |
CN109685126A (en) * | 2018-12-17 | 2019-04-26 | 北斗航天卫星应用科技集团有限公司 | Image classification method and image classification system based on depth convolutional neural networks |
CN109815953A (en) * | 2019-01-30 | 2019-05-28 | 电子科技大学 | One kind being based on vehicle annual test target vehicle identification matching system |
CN109934132A (en) * | 2019-02-28 | 2019-06-25 | 北京理工大学珠海学院 | Face identification method, system and storage medium based on random drop convolved data |
CN109993290A (en) * | 2017-12-30 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN110046223A (en) * | 2019-03-13 | 2019-07-23 | 重庆邮电大学 | Film review sentiment analysis method based on modified convolutional neural networks model |
CN110210380A (en) * | 2019-05-30 | 2019-09-06 | 盐城工学院 | The analysis method of personality is generated based on Expression Recognition and psychology test |
CN110223712A (en) * | 2019-06-05 | 2019-09-10 | 西安交通大学 | A kind of music emotion recognition method based on two-way convolution loop sparse network |
CN110276189A (en) * | 2019-06-27 | 2019-09-24 | 电子科技大学 | A kind of method for authenticating user identity based on gait information |
US20200011668A1 (en) * | 2018-07-09 | 2020-01-09 | Samsung Electronics Co., Ltd. | Simultaneous location and mapping (slam) using dual event cameras |
CN110705621A (en) * | 2019-09-25 | 2020-01-17 | 北京影谱科技股份有限公司 | Food image identification method and system based on DCNN and food calorie calculation method |
CN110765809A (en) * | 2018-07-25 | 2020-02-07 | 北京大学 | Facial expression classification method and device and emotion intelligent robot |
CN110807420A (en) * | 2019-10-31 | 2020-02-18 | 天津大学 | Facial expression recognition method integrating feature extraction and deep learning |
WO2020097936A1 (en) * | 2018-11-16 | 2020-05-22 | 华为技术有限公司 | Neural network compressing method and device |
WO2020164271A1 (en) * | 2019-02-13 | 2020-08-20 | 平安科技(深圳)有限公司 | Pooling method and device for convolutional neural network, storage medium and computer device |
CN112036433A (en) * | 2020-07-10 | 2020-12-04 | 天津城建大学 | CNN-based Wi-Move behavior sensing method |
CN112149449A (en) * | 2019-06-26 | 2020-12-29 | 北京华捷艾米科技有限公司 | Face attribute recognition method and system based on deep learning |
CN112329701A (en) * | 2020-11-20 | 2021-02-05 | 北京联合大学 | Facial expression recognition method for low-resolution images |
CN112613552A (en) * | 2020-12-18 | 2021-04-06 | 北京工业大学 | Convolutional neural network emotion image classification method combining emotion category attention loss |
CN113673567A (en) * | 2021-07-20 | 2021-11-19 | 华南理工大学 | Panorama emotion recognition method and system based on multi-angle subregion self-adaption |
US11651202B2 (en) | 2017-12-30 | 2023-05-16 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
US11704544B2 (en) | 2017-12-30 | 2023-07-18 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
US11734548B2 (en) | 2017-12-30 | 2023-08-22 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105447473A (en) * | 2015-12-14 | 2016-03-30 | 江苏大学 | PCANet-CNN-based arbitrary attitude facial expression recognition method |
CN105512624A (en) * | 2015-12-01 | 2016-04-20 | 天津中科智能识别产业技术研究院有限公司 | Smile face recognition method and device for human face image |
CN106503654A (en) * | 2016-10-24 | 2017-03-15 | 中国地质大学(武汉) | A kind of face emotion identification method based on the sparse autoencoder network of depth |
CN106529503A (en) * | 2016-11-30 | 2017-03-22 | 华南理工大学 | Method for recognizing face emotion by using integrated convolutional neural network |
CN106778444A (en) * | 2015-11-23 | 2017-05-31 | 广州华久信息科技有限公司 | A kind of expression recognition method based on multi views convolutional neural networks |
CN106778657A (en) * | 2016-12-28 | 2017-05-31 | 南京邮电大学 | Neonatal pain expression classification method based on convolutional neural networks |
-
2017
- 2017-08-18 CN CN201710714001.6A patent/CN107506722A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106778444A (en) * | 2015-11-23 | 2017-05-31 | 广州华久信息科技有限公司 | A kind of expression recognition method based on multi views convolutional neural networks |
CN105512624A (en) * | 2015-12-01 | 2016-04-20 | 天津中科智能识别产业技术研究院有限公司 | Smile face recognition method and device for human face image |
CN105447473A (en) * | 2015-12-14 | 2016-03-30 | 江苏大学 | PCANet-CNN-based arbitrary attitude facial expression recognition method |
CN106503654A (en) * | 2016-10-24 | 2017-03-15 | 中国地质大学(武汉) | A kind of face emotion identification method based on the sparse autoencoder network of depth |
CN106529503A (en) * | 2016-11-30 | 2017-03-22 | 华南理工大学 | Method for recognizing face emotion by using integrated convolutional neural network |
CN106778657A (en) * | 2016-12-28 | 2017-05-31 | 南京邮电大学 | Neonatal pain expression classification method based on convolutional neural networks |
Non-Patent Citations (6)
Title |
---|
ALI MOLLAHOSSEINI 等: "Going Deeper in Facial Expression Recognition using Deep Neural Networks", 《2016 IEEE WINTER CONFERENCE ON APPLICATION OF COMPUTER VISION》 * |
余萍 等: "基于矩阵2-范数池化的卷积神经网络图像识别算法", 《图学学报》 * |
吴正文: "卷积神经网络在图像分类中的研究应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
孙晓 等: "基于ROI-KNN卷积神经网络的面部表情识别", 《自动化学报》 * |
李慧珂: "人脸表情识别方法的分析与研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
楚敏南: "基于卷积神经网络的图像分类技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154504A (en) * | 2017-12-25 | 2018-06-12 | 浙江工业大学 | Method for detecting surface defects of steel plate based on convolutional neural network |
US11704544B2 (en) | 2017-12-30 | 2023-07-18 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
CN109993290B (en) * | 2017-12-30 | 2021-08-06 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
US11734548B2 (en) | 2017-12-30 | 2023-08-22 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
US11710031B2 (en) | 2017-12-30 | 2023-07-25 | Cambricon Technologies Corporation Limited | Parallel processing circuits for neural networks |
US11651202B2 (en) | 2017-12-30 | 2023-05-16 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
CN109993290A (en) * | 2017-12-30 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN108182260A (en) * | 2018-01-03 | 2018-06-19 | 华南理工大学 | A kind of Multivariate Time Series sorting technique based on semantic selection |
CN108460329A (en) * | 2018-01-15 | 2018-08-28 | 任俊芬 | A kind of face gesture cooperation verification method based on deep learning detection |
CN108597539B (en) * | 2018-02-09 | 2021-09-03 | 桂林电子科技大学 | Speech emotion recognition method based on parameter migration and spectrogram |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108875904A (en) * | 2018-04-04 | 2018-11-23 | 北京迈格威科技有限公司 | Image processing method, image processing apparatus and computer readable storage medium |
CN108846380A (en) * | 2018-04-09 | 2018-11-20 | 北京理工大学 | A kind of facial expression recognizing method based on cost-sensitive convolutional neural networks |
CN108846380B (en) * | 2018-04-09 | 2021-08-24 | 北京理工大学 | Facial expression recognition method based on cost-sensitive convolutional neural network |
CN108614875B (en) * | 2018-04-26 | 2022-06-07 | 北京邮电大学 | Chinese emotion tendency classification method based on global average pooling convolutional neural network |
CN108614875A (en) * | 2018-04-26 | 2018-10-02 | 北京邮电大学 | Chinese emotion tendency sorting technique based on global average pond convolutional neural networks |
CN108711150A (en) * | 2018-05-22 | 2018-10-26 | 电子科技大学 | A kind of end-to-end pavement crack detection recognition method based on PCA |
CN108711150B (en) * | 2018-05-22 | 2022-03-25 | 电子科技大学 | End-to-end pavement crack detection and identification method based on PCA |
CN108764128A (en) * | 2018-05-25 | 2018-11-06 | 华中科技大学 | A kind of video actions recognition methods based on sparse time slice network |
CN108806667A (en) * | 2018-05-29 | 2018-11-13 | 重庆大学 | The method for synchronously recognizing of voice and mood based on neural network |
CN108491835A (en) * | 2018-06-12 | 2018-09-04 | 常州大学 | Binary channels convolutional neural networks towards human facial expression recognition |
CN108898105A (en) * | 2018-06-29 | 2018-11-27 | 成都大学 | It is a kind of based on depth characteristic and it is sparse compression classification face identification method |
CN109033994B (en) * | 2018-07-03 | 2021-08-10 | 辽宁工程技术大学 | Facial expression recognition method based on convolutional neural network |
CN109033994A (en) * | 2018-07-03 | 2018-12-18 | 辽宁工程技术大学 | A kind of facial expression recognizing method based on convolutional neural networks |
US10948297B2 (en) * | 2018-07-09 | 2021-03-16 | Samsung Electronics Co., Ltd. | Simultaneous location and mapping (SLAM) using dual event cameras |
US20200011668A1 (en) * | 2018-07-09 | 2020-01-09 | Samsung Electronics Co., Ltd. | Simultaneous location and mapping (slam) using dual event cameras |
US11668571B2 (en) | 2018-07-09 | 2023-06-06 | Samsung Electronics Co., Ltd. | Simultaneous localization and mapping (SLAM) using dual event cameras |
CN110765809A (en) * | 2018-07-25 | 2020-02-07 | 北京大学 | Facial expression classification method and device and emotion intelligent robot |
CN108985457B (en) * | 2018-08-22 | 2021-11-19 | 北京大学 | Deep neural network structure design method inspired by optimization algorithm |
CN108985457A (en) * | 2018-08-22 | 2018-12-11 | 北京大学 | A kind of deep neural network construction design method inspired by optimization algorithm |
CN109409219A (en) * | 2018-09-19 | 2019-03-01 | 湖北工业大学 | Indoor occupant locating and tracking algorithm based on depth convolutional network |
WO2020097936A1 (en) * | 2018-11-16 | 2020-05-22 | 华为技术有限公司 | Neural network compressing method and device |
CN113302657B (en) * | 2018-11-16 | 2024-04-26 | 华为技术有限公司 | Neural network compression method and device |
CN113302657A (en) * | 2018-11-16 | 2021-08-24 | 华为技术有限公司 | Neural network compression method and device |
CN109685126A (en) * | 2018-12-17 | 2019-04-26 | 北斗航天卫星应用科技集团有限公司 | Image classification method and image classification system based on depth convolutional neural networks |
CN109635790A (en) * | 2019-01-28 | 2019-04-16 | 杭州电子科技大学 | A kind of pedestrian's abnormal behaviour recognition methods based on 3D convolution |
CN109815953A (en) * | 2019-01-30 | 2019-05-28 | 电子科技大学 | One kind being based on vehicle annual test target vehicle identification matching system |
WO2020164271A1 (en) * | 2019-02-13 | 2020-08-20 | 平安科技(深圳)有限公司 | Pooling method and device for convolutional neural network, storage medium and computer device |
CN109934132A (en) * | 2019-02-28 | 2019-06-25 | 北京理工大学珠海学院 | Face identification method, system and storage medium based on random drop convolved data |
CN110046223B (en) * | 2019-03-13 | 2021-05-18 | 重庆邮电大学 | Film evaluation emotion analysis method based on improved convolutional neural network model |
CN110046223A (en) * | 2019-03-13 | 2019-07-23 | 重庆邮电大学 | Film review sentiment analysis method based on modified convolutional neural networks model |
CN110210380A (en) * | 2019-05-30 | 2019-09-06 | 盐城工学院 | The analysis method of personality is generated based on Expression Recognition and psychology test |
CN110210380B (en) * | 2019-05-30 | 2023-07-25 | 盐城工学院 | Analysis method for generating character based on expression recognition and psychological test |
CN110223712B (en) * | 2019-06-05 | 2021-04-20 | 西安交通大学 | Music emotion recognition method based on bidirectional convolution cyclic sparse network |
CN110223712A (en) * | 2019-06-05 | 2019-09-10 | 西安交通大学 | A kind of music emotion recognition method based on two-way convolution loop sparse network |
CN112149449A (en) * | 2019-06-26 | 2020-12-29 | 北京华捷艾米科技有限公司 | Face attribute recognition method and system based on deep learning |
CN110276189A (en) * | 2019-06-27 | 2019-09-24 | 电子科技大学 | A kind of method for authenticating user identity based on gait information |
CN110705621A (en) * | 2019-09-25 | 2020-01-17 | 北京影谱科技股份有限公司 | Food image identification method and system based on DCNN and food calorie calculation method |
CN110807420A (en) * | 2019-10-31 | 2020-02-18 | 天津大学 | Facial expression recognition method integrating feature extraction and deep learning |
CN112036433B (en) * | 2020-07-10 | 2022-11-04 | 天津城建大学 | CNN-based Wi-Move behavior sensing method |
CN112036433A (en) * | 2020-07-10 | 2020-12-04 | 天津城建大学 | CNN-based Wi-Move behavior sensing method |
CN112329701A (en) * | 2020-11-20 | 2021-02-05 | 北京联合大学 | Facial expression recognition method for low-resolution images |
CN112613552B (en) * | 2020-12-18 | 2024-05-28 | 北京工业大学 | Convolutional neural network emotion image classification method combined with emotion type attention loss |
CN112613552A (en) * | 2020-12-18 | 2021-04-06 | 北京工业大学 | Convolutional neural network emotion image classification method combining emotion category attention loss |
CN113673567A (en) * | 2021-07-20 | 2021-11-19 | 华南理工大学 | Panorama emotion recognition method and system based on multi-angle subregion self-adaption |
CN113673567B (en) * | 2021-07-20 | 2023-07-21 | 华南理工大学 | Panorama emotion recognition method and system based on multi-angle sub-region self-adaption |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107506722A (en) | One kind is based on depth sparse convolution neutral net face emotion identification method | |
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
CN108615010B (en) | Facial expression recognition method based on parallel convolution neural network feature map fusion | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
CN105205475B (en) | A kind of dynamic gesture identification method | |
WO2018107760A1 (en) | Collaborative deep network model method for pedestrian detection | |
CN110728209A (en) | Gesture recognition method and device, electronic equipment and storage medium | |
CN109255364A (en) | A kind of scene recognition method generating confrontation network based on depth convolution | |
CN107679491A (en) | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data | |
CN110175501B (en) | Face recognition-based multi-person scene concentration degree recognition method | |
CN108710829A (en) | A method of the expression classification based on deep learning and the detection of micro- expression | |
CN108053398A (en) | A kind of melanoma automatic testing method of semi-supervised feature learning | |
CN106503687A (en) | The monitor video system for identifying figures of fusion face multi-angle feature and its method | |
CN109815826A (en) | The generation method and device of face character model | |
CN110889672A (en) | Student card punching and class taking state detection system based on deep learning | |
CN110378208B (en) | Behavior identification method based on deep residual error network | |
CN105740892A (en) | High-accuracy human body multi-position identification method based on convolutional neural network | |
CN106156765A (en) | safety detection method based on computer vision | |
CN114038037A (en) | Expression label correction and identification method based on separable residual attention network | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
Borgalli et al. | Deep learning for facial emotion recognition using custom CNN architecture | |
CN106909938A (en) | Viewing angle independence Activity recognition method based on deep learning network | |
CN111028319A (en) | Three-dimensional non-photorealistic expression generation method based on facial motion unit | |
CN111401116B (en) | Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171222 |