[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110321805B - Dynamic expression recognition method based on time sequence relation reasoning - Google Patents

Dynamic expression recognition method based on time sequence relation reasoning Download PDF

Info

Publication number
CN110321805B
CN110321805B CN201910504061.4A CN201910504061A CN110321805B CN 110321805 B CN110321805 B CN 110321805B CN 201910504061 A CN201910504061 A CN 201910504061A CN 110321805 B CN110321805 B CN 110321805B
Authority
CN
China
Prior art keywords
layer
feature
expression
dynamic
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910504061.4A
Other languages
Chinese (zh)
Other versions
CN110321805A (en
Inventor
韩守东
刘文龙
杨子清
张宏亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910504061.4A priority Critical patent/CN110321805B/en
Publication of CN110321805A publication Critical patent/CN110321805A/en
Application granted granted Critical
Publication of CN110321805B publication Critical patent/CN110321805B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于时序关系推理的动态表情识别方法,属于图像处理和机器视觉中的动态表情识别领域,所述方法包括:对表情图像序列进行多尺度时序稀疏采样,得到多个不同尺度的表情序列片段,并将所述表情序列片段进行数据增强后转换成固定大小;构建包括多尺度区域特征提取网络和时序关系推理模块的动态表情识别模型;将得到的表情序列片段输入动态识别模型中进行训练;将待识别的表情图像序列输入训练好的动态表情识别模型,得到动态表情识别结果。本发明的方法可以适应长时序输入,且能更好地提取脸部的局部区域特征,提高识别准确度;同时本发明方法在降低计算量的同时提高了模型性能。

Figure 201910504061

The invention discloses a dynamic expression recognition method based on time series relationship reasoning, belonging to the field of dynamic expression recognition in image processing and machine vision. The method includes: performing multi-scale time series sparse sampling on an expression image sequence to obtain multiple The facial expression sequence fragments are converted into a fixed size after data enhancement; construct a dynamic facial expression recognition model including a multi-scale regional feature extraction network and a time sequence relationship inference module; input the obtained facial expression sequence fragments into the dynamic recognition model training in the training process; input the facial expression image sequence to be recognized into the trained dynamic facial expression recognition model, and obtain the dynamic facial expression recognition result. The method of the present invention can adapt to long-term input, and can better extract the local area features of the face, thereby improving the recognition accuracy; at the same time, the method of the present invention improves the model performance while reducing the amount of calculation.

Figure 201910504061

Description

Dynamic expression recognition method based on time sequence relation reasoning
Technical Field
The invention belongs to the field of dynamic expression recognition in image processing and machine vision, and particularly relates to a dynamic expression recognition method based on time sequence relation reasoning.
Background
Facial expression recognition is an important research topic in the field of computer vision, and most of research is focused on a static expression recognition task which takes a single-frame expression image as a research object. However, the facial expression is a dynamic change process, and a single frame of facial expression image cannot completely capture the emotional change of the person. Compared with the method, the method has the advantages that the expression video or the expression image sequence is used as the dynamic expression recognition of the research object, and the texture information and the motion information which are rich and relevant to expression change can be utilized, so that the emotion change process of a person can be expressed more completely. However, because the problems of small expression data set scale, unbalanced distribution, data annotation deviation, posture change, illumination change, emotional expression difference, conflict between expressions and speaking and the like exist, the current dynamic expression recognition still faces many challenge problems.
The actual dynamic expression recognition research work mainly comprises two parts: expression sequence feature extraction and time sequence relation modeling, and in recent years, research on dynamic expression recognition is also well successful. At present, a 3D convolutional neural network is generally used for identifying dynamic expressions, the method is simple and direct, but the following problems and disadvantages exist: (1) in the aspect of input, the common method is to use a mode of densely sampling video frames and take 16 continuous frame images as input to extract features of an image sequence, so that the length of the input sequence is limited to a great extent, and the input sequence cannot be applied to an expression sequence with a long time sequence; (2) in the aspect of expression image feature extraction, in the conventional method, a convolution kernel with shared weight is generally adopted to perform global feature extraction on an expression image, different facial expression motions in different regions of a face have different structures and texture information, and although a single-scale region layer adopts different convolution kernels to process different local regions in order to utilize local information, the designed local regions are consistent in size and cannot be suitable for local multi-scale region feature learning, so that local information of facial expression changes cannot be fully utilized, and the subsequent recognition accuracy is influenced;
generally, the existing dynamic expression recognition method has the problems that the existing dynamic expression recognition method cannot adapt to long time sequence input, and the recognition accuracy is low due to the fact that the local facial region features cannot be fully utilized.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a dynamic expression recognition method based on time sequence relation reasoning, and aims to solve the problems that the existing dynamic expression recognition method cannot adapt to long time sequence input and has low recognition accuracy.
In order to achieve the above object, the present invention provides a dynamic expression recognition method based on time series relationship reasoning, which comprises:
(1) carrying out multi-scale time sequence sparse sampling on the expression image sequence to obtain a plurality of expression sequence segments with different scales, and carrying out data enhancement on the expression sequence segments and then converting the expression sequence segments into fixed sizes;
(2) constructing a dynamic expression recognition model;
the dynamic expression recognition model comprises a multi-scale regional feature extraction network and a time sequence relation reasoning module which are sequentially connected;
the multi-scale regional feature extraction network comprises: the first characteristic layer, the second characteristic layer, the third characteristic layer, the fourth characteristic layer, the fifth characteristic layer and the sixth characteristic layer are connected in sequence;
the first characteristic layer comprises an extrusion excitation characteristic extraction module and a multi-scale region module which are sequentially connected; the extrusion excitation feature extraction module is used for extracting features of the input image to obtain a feature map; the multi-scale region module comprises a convolution layer and three region layers with different scales; the convolution layer is used for performing convolution operation on the feature map output by the extrusion excitation feature extraction module; the region layer is used for dividing the characteristic diagram output by the convolution layer into a plurality of regions with fixed sizes, and performing convolution on each region by adopting different convolution kernels;
the second characteristic layer comprises an extrusion excitation characteristic extraction module and a multi-scale area module which are sequentially connected, and is used for extracting the characteristics of the characteristic diagram output by the first characteristic layer again to obtain the characteristic diagram with more abundant information;
the third characteristic layer, the fourth characteristic layer and the fifth characteristic layer are all composed of an extrusion excitation characteristic extraction module and are used for extracting characteristics of a characteristic diagram output before the current layer to obtain characteristics of a higher layer;
the sixth feature layer is a mean pooling layer and is used for reducing the dimension of the features output by the fifth feature layer to obtain semantic features of the expression images;
the time sequence relation reasoning module is used for constructing a time sequence relation between adjacent expression image frames for semantic features of expression images output by the multi-scale regional feature extraction network;
(3) inputting the expression sequence segments obtained in the step (1) into the dynamic recognition model for training to obtain a trained dynamic expression recognition model;
(4) and inputting the expression image sequence to be recognized into the trained dynamic expression recognition model to obtain a dynamic expression recognition result.
Further, the data enhancement in step (1) includes random horizontal flipping and random cropping.
Further, the data-enhanced expression sequence segments are converted into a fixed size of 224 × 224 pixels in step (1).
Further, the convolution layer in the multi-scale region module and three region layers with different scales form a residual error structure.
Further, the three different scale regional layers sequentially divide the feature map into 8 × 8, 4 × 4, and 2 × 2 regional blocks.
Further, the extrusion excitation feature extraction module comprises a depth separable convolution submodule and an extrusion excitation submodule;
the depth separable convolution submodule comprises a depth convolution layer and a common convolution layer with convolution kernel size of 1 x 1; the extrusion excitation submodule comprises a global mean value pooling layer, a first full-connection layer, a nonlinear activation layer, a second full-connection layer, an S-shaped function activation layer and a scale normalization layer.
Further, the time sequence relation reasoning module comprises a first layer perceptron, a second layer perceptron and a third layer perceptron;
the number of the nodes of the first layer of sensing machine is 512, the number of the nodes of the second layer of sensing machine is 256, and the number of the nodes of the third layer of sensing machine is the number of expression categories.
Further, the loss function of the dynamic expression recognition model is:
Figure BDA0002091180100000041
wherein C represents the total category number of expression sequences, yiAnd G represents the posterior probability output of each category obtained after the output of the time sequence relation reasoning module is normalized.
Through the technical scheme, compared with the prior art, the invention can obtain the following advantages
Has the advantages that:
(1) the method and the device have the advantages that the input expression images are subjected to multi-scale sparse sampling to obtain a plurality of expression sequence segments with different time scales, the method and the device are suitable for expression images with long time sequences, and for expression images with different expression change speeds, a complete fluctuation process from stable expression to stable expression can be acquired, so that the characteristics of the acquired images are closer to the change process of real expressions, and the accuracy of subsequent expression recognition is improved.
(2) The multi-scale regional feature extraction network constructed by the invention can better extract the local regional features of the face and enhance the expression capability of the network on the expression image features; meanwhile, the network structure uses the depth separable convolution module and the multi-scale area module constructed by the invention, compared with the traditional convolution module, the computation amount is reduced, and the model performance is improved.
(3) The invention adopts the time sequence relation reasoning module to construct the time sequence relation between the expression image sequence frames, and compared with a long-term and short-term memory network, the invention can not only improve the model training speed, but also improve the accuracy of dynamic expression recognition.
Drawings
FIG. 1 is a flow chart of a dynamic expression recognition method based on timing relationship reasoning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a multi-scale regional feature extraction network structure provided by an embodiment of the present invention;
FIG. 3 is a block diagram of a single-scale region layer provided by an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a multi-scale region module provided in an embodiment of the present invention;
fig. 5 shows the result of dynamic expression recognition obtained by the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the method for identifying dynamic expressions based on temporal relationship inference provided by the present invention includes:
(1) carrying out multi-scale time sequence sparse sampling on the expression image sequence to obtain a plurality of expression sequence segments with different scales, and carrying out data enhancement on the expression sequence segments and then converting the expression sequence segments into fixed sizes;
specifically, the invention carries out multi-scale sparse sampling on an input expression image sequence to obtain a plurality of expression sequence segments with different time scales, can be suitable for expression images with long time sequence, and can also acquire a complete fluctuation process from stability, fluctuation to stability of the expression for expression images with different expression change speeds, so that the characteristics of the acquired images are closer to the change process of real expression, as shown in figure 1, for expression image sequences with the length of 12 frames, when 2 frames of images are sampled, the expression image sequences are equally divided into 2 parts, and 1 frame of images are randomly sampled from each part; when sampling 3 frames of images, equally dividing the expression image sequence into 3 parts, and randomly sampling 1 frame of image from each part; similarly, when 4 frames of images are sampled, the expression image sequence is equally divided into 4 parts and then random sampling is carried out; and then carrying out data enhancement including random horizontal turning and random cutting on the acquired expression image sequence, and converting the expression sequence segment subjected to data enhancement into a fixed size of 224 x 224 pixels.
(2) Constructing a dynamic expression recognition model;
the dynamic expression recognition model comprises a multi-scale regional feature extraction network and a time sequence relation reasoning module which are connected in sequence; as shown in fig. 2, the multi-scale regional feature extraction network includes: the first feature layer1, the second feature layer2, the third feature layer3, the fourth feature layer4, the fifth feature layer5 and the sixth feature layer6 are connected in sequence;
the first feature layer1 comprises a squeezing excitation feature extraction module and a multi-scale region module which are sequentially connected; the device comprises an extrusion excitation feature extraction module, a feature extraction module and a feature extraction module, wherein the extrusion excitation feature extraction module is used for extracting features of an input image to obtain a feature map; the extrusion excitation feature extraction module comprises a depth separable convolution submodule and an extrusion excitation submodule; the depth separable convolution submodule comprises a depth convolution layer and a common convolution layer with convolution kernel size of 1 x 1;
in this way, the amount of computation and parameters can be reduced substantially by decomposing the standard convolution into a depth convolution and a point-by-point convolution, as compared to the standard convolution operation. Assume that the size of the input feature map is (D)F×DFX M), operating with a standard convolution kernel having a size of (D)K×DKXn), the amount of computation of the convolution operation is DK·DK·M·N·DF·DF(ii) a For the same input, a depth separable convolution operation is used, the amount of computation being DK·DK·M·DF·DF+M·N·DF·DF. In contrast, the amount of computation is reduced:
Figure BDA0002091180100000061
therefore, the depth separable convolution module can reduce the calculated amount and parameters of the model, and the extrusion excitation sub-module enhances the characteristic characterization capability of the model by designing a lightweight gating mechanism to adaptively select the channel importance, so that the model performance is improved;
the extrusion excitation submodule structurally comprises: the extrusion excitation submodule comprises a global mean value pooling layer, a first full-connection layer, a nonlinear activation layer, a second full-connection layer, an S-shaped function activation layer and a scale normalization layer; its main functions can be divided into two parts: embedding global information and self-adaptive readjusting; in order to solve the problem that a convolution kernel only aims at a local receptive field when performing convolution operation and cannot utilize semantic information outside the receptive field, global information embedding compresses global spatial information of each channel into a scalar to represent through extrusion operation, and the scalar can be regarded as importance weight of the corresponding channel; and the self-adaptive readjustment is to multiply the importance weight by the original input in a channel mode to obtain a new characteristic output, and the process is called excitation.
Different structures and texture information of different facial expression motions in different areas of the face are different, when a convolution kernel is used for extracting features, different convolution kernels are used for processing different local areas, and the structure of the existing single-scale area layer is shown in fig. 3, and the existing single-scale area layer cannot be suitable for local multi-scale area feature learning because the designed local areas are consistent in size. On the basis, a multi-scale area module is designed, and the structure of the multi-scale area module is shown in FIG. 4 and comprises a convolution layer and three area layers with different scales; the convolution layer is used for performing convolution operation on the characteristic graph output by the depth separable convolution module; the region layer is used for dividing the characteristic graph output by the convolution layer into a plurality of regions with fixed sizes, and performing convolution on each region by adopting different convolution kernels to obtain region characteristics with a plurality of scales; in the invention, three regional layers with different scales sequentially divide the characteristic diagram output by the convolutional layer into 8 × 8, 4 × 4 and 2 × 2 regional blocks. In addition, in order to avoid the problem of gradient disappearance, the multi-scale region module adopts a residual error structure, and the features output by the convolutional layer and the multi-scale region features output by the three region layers with different scales are summed, so that the combination of local features and global features is realized.
The second characteristic layer comprises a depth separable convolution module and a multi-scale area module which are sequentially connected, and is used for extracting the characteristics of the characteristic graph output by the first characteristic layer again to obtain the characteristic graph with more abundant information;
the third feature layer, the fourth feature layer and the fifth feature layer are depth separable convolution modules and are used for extracting features of a feature map output before the current layer to obtain features of a higher layer;
the sixth characteristic layer is a mean pooling layer and is used for reducing the dimension of the characteristics output by the fifth characteristic layer to obtain the semantic characteristics of the expression images;
the time sequence relation reasoning module is used for constructing a time sequence relation between adjacent expression image frames for semantic features of expression images output by the multi-scale region feature extraction network; the time sequence relation reasoning module comprises a first layer perceptron, a second layer perceptron and a third layer perceptron; the number of nodes of the first layer of perceptron is 512, the number of nodes of the second perceptron is 256, and the number of nodes of the third layer of perceptron is the number of expression categories;
the timing relationship between given three frames of expression sequence images is defined as follows:
Figure BDA0002091180100000071
wherein, the input is an expression sequence V ═ f1,f2,...,fn},fiAnd fjRespectively representing the feature representation of the ith frame image and the jth frame image in the sequence, namely the feature output of a sixth feature layer6 in the multi-scale regional feature extraction network, gθFunction sum hφThe function is used for fusing different ordered frame features, and the first layer perceptron and the second perceptron are used for representing gθFunction, using a third tier perceptron to represent hφA function.
(3) Inputting the expression sequence segments obtained in the step (1) into the dynamic recognition model for training to obtain a trained dynamic expression recognition model;
specifically, parameters in the network are optimized and upgraded by using an SGD algorithm, wherein the category prediction loss of the time sequence reasoning module about the expression sequence is calculated in a cross entropy mode, and a loss function is as follows:
Figure BDA0002091180100000081
wherein C represents the total category number of expression sequences, yiRepresenting the real label category, G representing the posterior probability output of each category obtained after the output of the time sequence relation reasoning module is normalized;
the model parameters to be optimized comprise multi-scale regional characteristic extraction network parameters W, gθA function weight parameter theta and
Figure BDA0002091180100000083
function weight parameter
Figure BDA0002091180100000084
If the parameter W is optimized by using a random gradient descent algorithm, the gradient of the corresponding parameter W can be expressed as:
Figure BDA0002091180100000082
by optimizing the parameter W in the above manner, it can be ensured that the learning of the parameter W is based on the whole expression sequence, rather than the partial expression sequence of a certain time period.
(4) And inputting the expression image sequence to be recognized into the trained dynamic expression recognition model to obtain a dynamic expression recognition result.
A partial recognition result visualization is shown in fig. 5. The classification results of the first two ranked classification confidence levels obtained in the model provided by the method are correspondingly arranged below each expression image sequence, the number behind each classification result represents the classification confidence level, the first expression model finally identifies the probability value of the category, and the second expression model most easily identifies the probability value of the category by mistake.
In order to verify the effectiveness of the method for identifying the dynamic expression, the method uses parameters, calculated amount and overall accuracy as evaluation indexes to perform comparative analysis on the same data set with the existing mainstream methods, training sets and test sets adopted by different methods on the same data set are completely consistent, and experimental results are shown in table 1, wherein Baseline is a standard model algorithm provided for the data set.
Figure BDA0002091180100000091
TABLE 1 comparison of computational efficiency between different methods
From the comparison results, the method is also a single model method, and compared with other methods, the method provided by the invention achieves the highest overall accuracy under the condition of smaller calculation amount and parameter amount. Particularly, compared with the MRE-CNN method with the highest overall accuracy, the method has the accuracy higher than that of the MRE-CNN method by 1.09 percent under the condition that both the parameter quantity and the calculated quantity are reduced by 2 orders of magnitude. Therefore, the method effectively improves the accuracy of dynamic expression recognition on the basis of greatly reducing the calculated amount and the parameter quantity.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1.一种基于时序关系推理的动态表情识别方法,其特征在于,包括:1. a dynamic expression recognition method based on temporal relationship reasoning, is characterized in that, comprises: (1)对表情图像序列进行多尺度时序稀疏采样,得到多个不同尺度的表情序列片段,并将所述表情序列片段进行数据增强后转换成固定大小;所述多尺度时序稀疏采样具体过程为:先将表情图像序列均分为多个部分,再从每个部分中进行随机采样,进而获得不同时域尺度的特征;(1) Multi-scale time series sparse sampling is performed on the expression image sequence to obtain multiple expression sequence fragments of different scales, and the expression sequence fragments are converted into a fixed size after data enhancement; the specific process of the multi-scale time series sparse sampling is as follows: : First divide the expression image sequence into multiple parts, and then randomly sample from each part to obtain features of different time-domain scales; (2)构建动态表情识别模型;(2) Build a dynamic expression recognition model; 所述动态表情识别模型包括依次连接的多尺度区域特征提取网络和时序关系推理模块;The dynamic expression recognition model includes a multi-scale regional feature extraction network and a time sequence relationship reasoning module connected in sequence; 所述多尺度区域特征提取网络包括:依次连接的第一特征层、第二特征层、第三特征层、第四特征层,第五特征层和第六特征层;The multi-scale region feature extraction network includes: a first feature layer, a second feature layer, a third feature layer, a fourth feature layer, a fifth feature layer and a sixth feature layer connected in sequence; 所述第一特征层,包括依次连接的挤压激励特征提取模块和多尺度区域模块;所述挤压激励特征提取模块,用于对输入图像进行特征提取,得到特征图;所述多尺度区域模块,包括卷积层和三个不同尺度的区域层;所述多尺度区域模块中的卷积层与三个不同尺度的区域层组成残差结构;所述卷积层,用于对所述挤压激励特征提取模块输出的特征图进行卷积操作;所述区域层,用于将所述卷积层输出的特征图分成固定大小的多个区域,对每个区域采用不同的卷积核进行卷积;The first feature layer includes an extrusion excitation feature extraction module and a multi-scale region module connected in sequence; the extrusion excitation feature extraction module is used to perform feature extraction on the input image to obtain a feature map; the multi-scale region module, including a convolutional layer and three regional layers of different scales; the convolutional layer and three regional layers of different scales in the multi-scale regional module form a residual structure; the convolutional layer is used for the Squeeze the feature map output by the excitation feature extraction module to perform a convolution operation; the region layer is used to divide the feature map output by the convolution layer into multiple regions of a fixed size, and different convolution kernels are used for each region. Do convolution; 所述第二特征层,包括依次连接的挤压激励特征提取模块和多尺度区域模块,用于对第一特征层输出的特征图再次进行特征提取,得到包含信息更为丰富的特征图;The second feature layer includes a squeeze excitation feature extraction module and a multi-scale region module connected in sequence, and is used to perform feature extraction on the feature map output by the first feature layer again to obtain a feature map with richer information; 所述第三特征层、第四特征层和第五特征层均由挤压激励特征提取模块组成,用于对当前层之前输出的特征图进行特征提取得到更为高层的特征;The third feature layer, the fourth feature layer, and the fifth feature layer are all composed of a squeeze excitation feature extraction module, which is used to perform feature extraction on the feature map output before the current layer to obtain higher-level features; 所述第六特征层为均值池化层,用于对第五特征层输出的特征进行降维,得到表情图像的语义特征;The sixth feature layer is a mean pooling layer, which is used to reduce the dimension of the features output by the fifth feature layer to obtain the semantic features of the expression image; 所述时序关系推理模块,用于对所述多尺度区域特征提取网络输出的表情图像的语义特征构建相邻表情图像帧之间的时序关系;The time sequence relationship reasoning module is used to construct a time sequence relationship between adjacent expression image frames for the semantic features of the expression images output by the multi-scale regional feature extraction network; (3)将步骤(1)得到的表情序列片段输入所述动态表情 识别模型中进行训练,得到训练好的动态表情识别模型;(3) the expression sequence fragments obtained in step (1) are input into the dynamic expression recognition model for training, and the trained dynamic expression recognition model is obtained; (4)将待识别的表情图像序列输入训练好的动态表情识别模型,得到动态表情识别结果。(4) Input the facial expression image sequence to be recognized into the trained dynamic facial expression recognition model to obtain the dynamic facial expression recognition result. 2.根据权利要求1所述的一种基于时序关系推理的动态表情识别方法,其特征在于,步骤(1)中所述数据增强包括随机水平翻转和随机裁剪。2 . The method for dynamic expression recognition based on time sequence relationship reasoning according to claim 1 , wherein the data enhancement in step (1) includes random horizontal flipping and random cropping. 3 . 3.根据权利要求1或2所述的一种基于时序关系推理的动态表情识别方法,其特征在于,步骤(1)中将经过数据增强的表情序列片段转换成224*224像素的固定大小。3. A kind of dynamic expression recognition method based on time sequence relationship reasoning according to claim 1 and 2, it is characterized in that, in step (1), the expression sequence fragment through data enhancement is converted into the fixed size of 224*224 pixels. 4.根据权利要求1所述的一种基于时序关系推理的动态表情识别方法,其特征在于,所述三个不同尺度的区域层依次将特征图分成8×8、4×4和2×2个区域块。4. The method for dynamic expression recognition based on temporal relationship reasoning according to claim 1, wherein the three regional layers of different scales sequentially divide the feature map into 8×8, 4×4 and 2×2 area block. 5.根据权利要求1所述的一种基于时序关系推理的动态表情识别方法,其特征在于,所述挤压激励特征提取模块包括深度可分离卷积子模块和挤压激励子模块;5. A kind of dynamic expression recognition method based on time sequence relationship reasoning according to claim 1, is characterized in that, described squeeze excitation feature extraction module comprises depth separable convolution submodule and squeeze excitation submodule; 其中,所述深度可分离卷积子模块,包括一个深度卷积层和一个卷积核大小为1*1的卷积层;所述挤压激励子模块包括全局均值池化层、第一全连接层、非线性激活层、第二全连接层、S型函数激活层和尺度归一化层。Wherein, the depthwise separable convolution submodule includes a depthwise convolutional layer and a convolutional layer with a convolution kernel size of 1*1; the squeeze excitation submodule includes a global mean pooling layer, a first full connection layer, nonlinear activation layer, second fully connected layer, sigmoid function activation layer and scale normalization layer. 6.根据权利要求1所述的一种基于时序关系推理的动态表情识别方法,其特征在于,所述时序关系推理模块包括第一层感知机、第二层感知机和第三层感知机;6. A kind of dynamic expression recognition method based on time sequence relationship reasoning according to claim 1, is characterized in that, described time sequence relationship reasoning module comprises first-layer perceptron, second-layer perceptron and third-layer perceptron; 所述第一层感知机的节点数为512,所述第二层感知机的节点数为256,所述第三层感知机的节点数为表情类别数。The number of nodes of the first-layer perceptron is 512, the number of nodes of the second-layer perceptron is 256, and the number of nodes of the third-layer perceptron is the number of expression categories. 7.根据权利要求1-6任一项所述的一种基于时序关系推理的动态表情识别方法,其特征在于,所述动态表情识别模型的损失函数为:7. a kind of dynamic expression recognition method based on time sequence relationship reasoning according to any one of claim 1-6, is characterized in that, the loss function of described dynamic expression recognition model is:
Figure FDA0003103639150000031
Figure FDA0003103639150000031
其中,C表示表情序列的总类别数,yi表示真实标签类别,G表示时序关系推理模块的输出经过归一化之后得到的关于每个类别的后验概率输出。Among them, C represents the total number of categories of the expression sequence, yi represents the real label category, and G represents the posterior probability output of each category obtained after the output of the temporal relationship inference module is normalized.
CN201910504061.4A 2019-06-12 2019-06-12 Dynamic expression recognition method based on time sequence relation reasoning Expired - Fee Related CN110321805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910504061.4A CN110321805B (en) 2019-06-12 2019-06-12 Dynamic expression recognition method based on time sequence relation reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910504061.4A CN110321805B (en) 2019-06-12 2019-06-12 Dynamic expression recognition method based on time sequence relation reasoning

Publications (2)

Publication Number Publication Date
CN110321805A CN110321805A (en) 2019-10-11
CN110321805B true CN110321805B (en) 2021-08-10

Family

ID=68119461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910504061.4A Expired - Fee Related CN110321805B (en) 2019-06-12 2019-06-12 Dynamic expression recognition method based on time sequence relation reasoning

Country Status (1)

Country Link
CN (1) CN110321805B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325243B (en) * 2020-02-03 2023-06-16 天津大学 Visual relationship detection method based on regional attention learning mechanism
CN112101119A (en) * 2020-08-18 2020-12-18 东南大学 Natural scene dynamic expression recognition method and device based on EC-STFL loss function
CN112233353A (en) * 2020-09-24 2021-01-15 国网浙江兰溪市供电有限公司 Artificial intelligence-based anti-fishing monitoring system and monitoring method thereof
CN112699815A (en) * 2020-12-30 2021-04-23 常州码库数据科技有限公司 Dynamic expression recognition method and system based on space-time motion enhancement network
CN113392822B (en) * 2021-08-18 2021-10-29 华中科技大学 Facial motion unit detection method and system based on feature separation representation learning
CN114120445B (en) * 2021-11-18 2025-06-10 北京易达图灵科技有限公司 Behavior recognition method and device for dynamic information enhancement

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139004A (en) * 2015-09-23 2015-12-09 河北工业大学 Face expression identification method based on video sequences
CN109409222A (en) * 2018-09-20 2019-03-01 中国地质大学(武汉) A kind of multi-angle of view facial expression recognizing method based on mobile terminal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DK1558934T3 (en) * 2002-10-31 2013-10-07 Chemometec As Particle Assessment Method
CN108764063B (en) * 2018-05-07 2020-05-19 华中科技大学 Remote sensing image time-sensitive target identification system and method based on characteristic pyramid
CN108846355B (en) * 2018-06-11 2020-04-28 腾讯科技(深圳)有限公司 Image processing method, face recognition device and computer equipment
CN109509187B (en) * 2018-11-05 2022-12-13 中山大学 An Efficient Inspection Algorithm for Small Defects in Large Resolution Cloth Images
CN109801269B (en) * 2018-12-29 2023-08-22 华南理工大学 Tongue fur physique classification method based on competitive extrusion and excitation neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139004A (en) * 2015-09-23 2015-12-09 河北工业大学 Face expression identification method based on video sequences
CN109409222A (en) * 2018-09-20 2019-03-01 中国地质大学(武汉) A kind of multi-angle of view facial expression recognizing method based on mobile terminal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Region and Multi-label Learning for Facial Action Unit Detection;Kaili Zhao et al;《2016 IEEE conference on Computer Vision and Pattern Recognition》;20161212;第3391-3399页 *
基于改进卷积神经网络的多视角人脸表情识别;钱勇生 等;《计算机工程与应用》;20181215;第54卷(第24期);第12-19页 *
基于时空特征和深度学习模型的人脸表情识别算法研;徐舒霖;《中国硕士优秀学位论文全文数据库信息科技辑》;20190115(第01期);第I138-2505页 *

Also Published As

Publication number Publication date
CN110321805A (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN109597891B (en) Text emotion analysis method based on bidirectional long-and-short-term memory neural network
CN110852383B (en) Target detection method and device based on attention mechanism deep learning network
CN111369563A (en) A Semantic Segmentation Method Based on Pyramid Atrous Convolutional Networks
CN109300121A (en) Method and system for constructing a diagnostic model of cardiovascular disease and the diagnostic model
CN111091045A (en) A Sign Language Recognition Method Based on Spatio-temporal Attention Mechanism
CN113221639A (en) Micro-expression recognition method for representative AU (AU) region extraction based on multitask learning
CN112818861A (en) Emotion classification method and system based on multi-mode context semantic features
US12079703B2 (en) Convolution-augmented transformer models
Huang et al. End-to-end continuous emotion recognition from video using 3D ConvLSTM networks
CN109993100A (en) Realization method of facial expression recognition based on deep feature clustering
CN113780249B (en) Expression recognition model processing method, device, equipment, medium and program product
CN115393933A (en) A video face emotion recognition method based on frame attention mechanism
CN111445545B (en) Text transfer mapping method and device, storage medium and electronic equipment
CN116503726A (en) Multi-scale light smoke image segmentation method and device
CN115019173A (en) Garbage identification and classification method based on ResNet50
CN114694174A (en) A human interaction behavior recognition method based on spatiotemporal graph convolution
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN117593790A (en) Segment-level multi-scale action segmentation method based on expansion attention mechanism
CN116167014A (en) Multi-mode associated emotion recognition method and system based on vision and voice
Jadhav et al. Content based facial emotion recognition model using machine learning algorithm
CN115205309A (en) A method and device for extraterrestrial image segmentation based on semi-supervised learning
CN113850182A (en) Action identification method based on DAMR-3 DNet
CN114882412B (en) Annotated and associated short video emotion recognition method and system based on vision and language
CN113191408A (en) Gesture recognition method based on double-flow neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210810

CF01 Termination of patent right due to non-payment of annual fee