CN112699815A

CN112699815A - Dynamic expression recognition method and system based on space-time motion enhancement network

Info

Publication number: CN112699815A
Application number: CN202011642743.0A
Authority: CN
Inventors: 冯全; 吕修旺; 姚潇
Original assignee: Changzhou Code Library Data Technology Co ltd
Current assignee: Changzhou Code Library Data Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-23

Abstract

The invention discloses a dynamic expression recognition method and system based on a spatiotemporal motion enhancement network, comprising the following steps: acquiring a video sequence containing an expression image; inputting the video sequence into a Resnet-Emotion network to obtain the spatial feature of the expression; The spatial feature is input into the recursive refining unit network to obtain the motion feature of the expression; the motion feature is input into the gated recurrent unit network to obtain the temporal feature of the expression, and the recognition of the dynamic expression is completed. Based on the Resnet-Emotion network pre-trained by the expression data set, the invention realizes the dynamic expression recognition that enhances the expression movement feature of each video frame and mines the time information of the expression, and solves the problems existing in the current dynamic expression recognition methods. The problem of ignoring the motion characteristics of expression videos.

Description

Dynamic facial expression recognition method and system based on spatiotemporal motion enhancement network

技术领域technical field

本发明涉及计算机视觉技术领域，具体涉及基于时空运动增强网络的动态表情识别方法及系统。The invention relates to the technical field of computer vision, in particular to a dynamic expression recognition method and system based on a spatiotemporal motion enhancement network.

背景技术Background technique

随着人脸表情识别理论的不断完善和人工智能技术的发展，人脸表情识别领域受到了广泛的关注。表情识别主要分成两种方法：基于静态图像的表情识别方法和基于动态视频的表情识别方法。在基于动态视频的表情识别方法中，由于人脸的光照、遮挡和头部姿态与自然环境类似，如果表情识别模型能够关注到人脸表情的局部运动细节并突出这些运动细节特征，无疑能够提高视频表情识别的准确性。而在视频中这些局部运动细节表现为上下帧之间的运动区域，如何利用视频中的运动特性表征人脸表情的连续性成为了动态表情识别的重点与难点。With the continuous improvement of facial expression recognition theory and the development of artificial intelligence technology, the field of facial expression recognition has received extensive attention. Expression recognition is mainly divided into two methods: expression recognition method based on static image and expression recognition method based on dynamic video. In the expression recognition method based on dynamic video, since the illumination, occlusion and head posture of the face are similar to the natural environment, if the expression recognition model can pay attention to the local motion details of the facial expression and highlight these motion details, it will undoubtedly improve the The accuracy of video expression recognition. In the video, these local motion details appear as the motion area between the upper and lower frames. How to use the motion characteristics in the video to represent the continuity of facial expressions has become the focus and difficulty of dynamic expression recognition.

发明内容SUMMARY OF THE INVENTION

针对现有技术的不足，本发明的目的在于提供一种基于时空运动增强网络的动态表情识别方法及系统，以解决现有技术中存在的视频表情识别模型忽略了表情视频的运动特性的问题。In view of the deficiencies of the prior art, the purpose of the present invention is to provide a dynamic expression recognition method and system based on a spatiotemporal motion enhancement network, so as to solve the problem that the video expression recognition model in the prior art ignores the motion characteristics of the expression video.

为达到上述目的，本发明所采用的技术方案是：In order to achieve the above object, the technical scheme adopted in the present invention is:

一种动态表情识别方法，包括以下步骤：A dynamic expression recognition method, comprising the following steps:

获取含有表情图像的视频序列；Obtain a video sequence containing facial expressions;

将所述视频序列输入至Resnet-Emotion网络中，得到表情的空间特征；Inputting the video sequence into the Resnet-Emotion network to obtain the spatial features of expressions;

将所述空间特征输入至递归精炼单元网络，得到表情的运动特征；Inputting the spatial feature into the recursive refining unit network to obtain the motion feature of the expression;

将所述运动特征输入至门控循环单元网络，得到表情的时间特征，完成动态表情的识别。The motion features are input into the gated cyclic unit network to obtain the temporal features of expressions, and the recognition of dynamic expressions is completed.

进一步的，所述运动特征的获取过程如下：Further, the acquisition process of the motion feature is as follows:

获取视频序列中当前帧的原始空间特征、上一帧的原始空间特征及上一帧经递归精炼单元网络输出的运动特征；Obtain the original spatial feature of the current frame in the video sequence, the original spatial feature of the previous frame, and the motion feature output by the recursive refining unit network of the previous frame;

将当前帧的原始空间特征、上一帧的原始空间特征及上一帧经递归精炼单元网络输出的运动特征同时输入至递归精炼单元网络，经递归精炼单元网络中的更新门模型处理后，得到整体注意力图；The original spatial features of the current frame, the original spatial features of the previous frame, and the motion features output by the recursive refining unit network of the previous frame are simultaneously input to the recursive refining unit network, and after being processed by the update gate model in the recursive refining unit network, we get overall attention map;

将所述整体注意力图输入至sigmoid函数，由递归精炼单元网络输出当前帧的运动特征。The overall attention map is input to the sigmoid function, and the motion feature of the current frame is output by the recursive refining unit network.

进一步的，所述更新门模型包括过渡层、空间注意力模型和通道注意力模型；所述空间注意力模型、通道注意力模型均与过渡层连接。Further, the update gate model includes a transition layer, a spatial attention model and a channel attention model; the spatial attention model and the channel attention model are all connected with the transition layer.

进一步的，所述更新门模型的处理过程如下：Further, the process of updating the gate model is as follows:

将当前帧的原始空间特征、上一帧的原始空间特征及上一帧经递归精炼单元网络输出的运动特征同时输入至过渡层，经过渡层产生第一空间特征；The original spatial feature of the current frame, the original spatial feature of the previous frame, and the motion feature output by the recursive refining unit network of the previous frame are simultaneously input to the transition layer, and the first spatial feature is generated through the transition layer;

将所述第一空间特征输入至空间注意力模型，经空间注意力模型得到空间注意图；Inputting the first spatial feature into a spatial attention model, and obtaining a spatial attention map through the spatial attention model;

将所述第一空间特征输入至通道注意力模型，经通道注意力模型得到通道注意图；Inputting the first spatial feature into the channel attention model, and obtaining the channel attention map through the channel attention model;

将所述空间注意图与通道注意图相乘得到整体注意力图。The overall attention map is obtained by multiplying the spatial attention map with the channel attention map.

进一步的，所述过渡层包括依次设置的卷积层、批归一化层和ReLU激活函数；所述空间注意力模型包括依次设置的全局跨通道平均池层、第一全连接层、第二全连接层；所述通道注意力模型包括依次设置的全局空间平均池层和全连接层。Further, the transition layer includes a convolutional layer, a batch normalization layer and a ReLU activation function that are set in sequence; the spatial attention model includes a global cross-channel average pooling layer, a first fully connected layer, a second A fully connected layer; the channel attention model includes a global spatial average pooling layer and a fully connected layer that are set in sequence.

进一步的，所述空间注意图为：Further, the spatial attention map is:

其中，Z_s为空间注意图，

和

分别为第一全连接层、第二全连接层的参数，Relu为Relu激活函数，

为经全局跨通道平均池层后的空间特征；where Z _s is the spatial attention map,

and

are the parameters of the first fully connected layer and the second fully connected layer, Relu is the Relu activation function,

is the spatial feature after the global cross-channel average pooling layer;

所述通道注意图为：The channel attention map is:

其中，Z_c为通道注意图；W_c为全连接层的参数；

为经全局空间平均池层后的空间特征。Among them, Z _c is the channel attention map; W _c is the parameter of the fully connected layer;

is the spatial feature after the global spatial average pooling layer.

进一步的，所述门控循环单元网络包括依次设置的更新门、重置门、候选状态和输出门。Further, the gated recurrent unit network includes an update gate, a reset gate, a candidate state and an output gate which are arranged in sequence.

进一步的，所述更新门为：Further, the update gate is:

z_t＝σ(W_z·[h_t-1,x_t])z _t =σ(W _z ·[h _t-1 ,x _t ])

所述重置门为：The reset gate is:

r_t＝σ(W_r·[h_t-1,x_t])r _t =σ(W _r ·[h _t-1 ,x _t ])

其中，z_t,r_t分别表示为更新门和重置门；h_t-1表示上一时刻的输出，x_t表示当前时刻的输入，W_r表示重置门参数权重，σ表示sigmoid函数。Among them, z _t , r _t represent the update gate and the reset gate respectively; h _t-1 represents the output of the previous moment, x _t represents the input of the current moment, W _r represents the weight of the reset gate parameters, and σ represents the sigmoid function.

一种动态表情识别系统，所述系统包括：A dynamic expression recognition system, the system includes:

获取模块：用于获取含有表情图像的视频序列；Acquisition module: used to acquire video sequences containing facial expressions;

第一输入模块：用于将所述视频序列输入至Resnet-Emotion网络中，得到表情的空间特征；The first input module: used to input the video sequence into the Resnet-Emotion network to obtain the spatial features of expressions;

第二输入模块：用于将所述空间特征输入至递归精炼单元网络，得到表情的运动特征；The second input module: for inputting the spatial feature into the recursive refining unit network to obtain the motion feature of the expression;

第三输入模块：用于将所述运动特征输入至门控循环单元网络，得到表情的时间特征，完成动态表情的识别。The third input module is used for inputting the motion features into the gated recurrent unit network to obtain the temporal features of expressions and complete the recognition of dynamic expressions.

一种动态表情识别系统，所述系统包括处理器和存储介质；A dynamic expression recognition system, the system includes a processor and a storage medium;

所述存储介质用于存储指令；the storage medium is used for storing instructions;

所述处理器用于根据所述指令进行操作以执行上述所述方法的步骤。The processor is configured to operate in accordance with the instructions to perform the steps of the method described above.

与现有技术相比，本发明所达到的有益效果是：Compared with the prior art, the beneficial effects achieved by the present invention are:

本发明使用基于迁移学习的预训练残差网络Resnet-Emotion，并在此基础上输入连续帧视频序列，将静态人脸表情识别技术迁移到动态表情识别技术，结合递归精炼单元利用注意力模型，增强每个视频帧的表情运动特征，解决了动态表情识别模型忽略了表情视频的运动特性的问题；通过使用循环神经网络从表情序列中获得时序信息，能够有效地帮助网络对表情序列进行时序建模。The present invention uses the pre-training residual network Resnet-Emotion based on migration learning, and inputs continuous frame video sequences on this basis, migrates the static facial expression recognition technology to the dynamic expression recognition technology, and uses the attention model in combination with the recursive refining unit, By enhancing the expression movement characteristics of each video frame, the problem that the dynamic expression recognition model ignores the movement characteristics of the expression video is solved; by using the recurrent neural network to obtain the time series information from the expression sequence, it can effectively help the network to construct the time series of the expression sequence. mold.

附图说明Description of drawings

图1为本发明的流程图；Fig. 1 is the flow chart of the present invention;

图2为时空运动增强网络结构示意图；Figure 2 is a schematic diagram of the spatiotemporal motion enhancement network structure;

图3为递归精炼单元结构示意图。Figure 3 is a schematic diagram of the structure of the recursive refining unit.

具体实施方式Detailed ways

下面结合附图对本发明作进一步描述。以下实施例仅用于更加清楚地说明本发明的技术方案，而不能以此来限制本发明的保护范围。The present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solutions of the present invention more clearly, and cannot be used to limit the protection scope of the present invention.

如图1、图2所示，基于时空运动增强网络的动态表情识别方法，首先使用经过表情数据集预训练的Resnet-Emotion网络提取人脸表情的空间特征，然后使用递归精炼单元(Refining Recurrent Unit，RRU)增强每个视频帧的表情运动特征，最后利用GRU挖掘表情的时间信息。具体包括以下步骤：As shown in Figure 1 and Figure 2, the dynamic expression recognition method based on the spatiotemporal motion enhancement network first uses the Resnet-Emotion network pre-trained on the expression dataset to extract the spatial features of facial expressions, and then uses the recursive refining unit (Refining Recurrent Unit). , RRU) to enhance the expression motion features of each video frame, and finally use GRU to mine the temporal information of expressions. Specifically include the following steps:

S1，使用基于迁移学习的预训练Resnet-Emotion网络从视频序列中提取表情的空间信息：S1, using a pre-trained Resnet-Emotion network based on transfer learning to extract the spatial information of expressions from video sequences:

获取视频序列中表情的空间特征，包括以下过程：Obtaining the spatial features of expressions in a video sequence includes the following processes:

S11，构建一组拥有多张表情图像的视频序列；S11, construct a set of video sequences with multiple expression images;

由于每个视频中检测出的表情帧个数不同，在选择表情帧个数为n时，我们采取以下策略：首先查看该样本中表情帧个数N，若N≥n，则从第[0,N-n]帧中随机挑选一帧开始读取，连续读取n帧表情图像。若N<＝n，则从第一帧开始读取，读取到第N帧，之后重复读取N-n次第N帧表情样本，这样可以保证读取到的表情具有连贯性。Since the number of expression frames detected in each video is different, when selecting the number of expression frames as n, we adopt the following strategy: first check the number N of expression frames in the sample, if N ≥ n, start from the [0th ,N-n] frame is randomly selected to start reading, and n frames of expression images are continuously read. If N<=n, start reading from the first frame, read the Nth frame, and then repeatedly read the Nth frame expression samples N-n times, so as to ensure that the read expressions are coherent.

S12，经Res-Emotion特征提取模块对视频序列中的表情图像进行特征提取，得到维度为T×512×H×W大小的特征向量，即人脸表情的空间特征。S12 , feature extraction is performed on the expression images in the video sequence through the Res-Emotion feature extraction module, and a feature vector with a dimension of T×512×H×W is obtained, that is, the spatial feature of the facial expression.

S2，使用递归精炼单元增强每个视频帧的表情运动特征。S2, using a recursive refinement unit to enhance the expression motion features of each video frame.

将S1获取的人脸表情的空间特征按时间顺序输入到RRU中进行特征精炼，具体包括以下步骤：The spatial features of facial expressions obtained by S1 are input into the RRU in chronological order for feature refinement, which specifically includes the following steps:

S21，如图3所示，每次向递归精炼单元中输入的空间特征包括如下三个部分；S21, as shown in Figure 3, the spatial feature input into the recursive refining unit each time includes the following three parts;

上一帧的原始空间特征X_i,k,t-1、上一帧经RRU后输出的运动特征S_i,k,t-1以及当前帧的原始空间特征X_i,k,t。The original spatial feature X _i,k,t-1 of the previous frame, the motion feature S _i,k,t-1 output by the RRU of the previous frame, and the original spatial feature X _i,k,t of the current frame.

S22，将上一帧的原始空间特征、上一帧经RRU后输出的运动特征及当前帧的原始空间特征同时输入至递归精炼单元，上述特征经过递归精炼单元的更新门模型，然后输出空间特征S_i,k,t；S22, input the original spatial feature of the previous frame, the motion feature outputted by the RRU of the previous frame, and the original spatial feature of the current frame into the recursive refining unit at the same time, the above-mentioned features are subjected to the update gate model of the recursive refining unit, and then output the spatial feature S _i,k,t ;

递归精炼单元的更新门模型处理上述特征的过程如下：The update gate model of the recursive refining unit handles the above features as follows:

S221，更新门模型包括依次设置的过渡层、空间注意力模型和通道注意力模型；上一帧的原始空间特征、上一帧经RRU后输出的运动特征及当前帧的原始空间特征依次输入至过渡层、空间注意力模型和通道注意力模型；S221, the update gate model includes a transition layer, a spatial attention model, and a channel attention model set in sequence; the original spatial feature of the previous frame, the motion feature output by the RRU of the previous frame, and the original spatial feature of the current frame are sequentially input to transition layer, spatial attention model and channel attention model;

过渡层由C个1×1滤波器的卷积层、批归一化(BN)层和ReLU激活函数组成。过渡层对每个像素点在不同的通道上进行信息整合，同时保留了每个空间位置的运动信息，并减少特征维度。The transition layer consists of a convolutional layer of C 1×1 filters, a batch normalization (BN) layer, and a ReLU activation function. The transition layer integrates the information of each pixel on different channels, while retaining the motion information of each spatial position and reducing the feature dimension.

上一帧的原始空间特征、上一帧经RRU后输出的运动特征及当前帧的原始空间特征依次输入至卷积层、批归一化层和ReLU激活函数，得到第一空间特征Z_t∈R^C×H×W。The original spatial features of the previous frame, the motion features output by the RRU of the previous frame, and the original spatial features of the current frame are sequentially input to the convolutional layer, batch normalization layer and ReLU activation function to obtain the first spatial feature Z _t ∈ R ^C×H×W .

空间注意力模型包括全局跨通道平均池层和两个FC层；The spatial attention model includes a global cross-channel average pooling layer and two FC layers;

为了使得网络能够更加关注特征图中的重要空间信息部位，使用空间注意力模型。对于第一空间特征Z_t∈R^C×H×W，首先输入至全局跨通道平均池层来获得每个空间位置的整体响应，然后依次输入至两个FC层生成空间注意图Z_s∈R^1×H×W：In order to make the network pay more attention to the important spatial information parts in the feature map, a spatial attention model is used. For the first spatial feature Z _t ∈ R ^C×H×W , it is first input to the global cross-channel average pooling layer to obtain the overall response of each spatial location, and then input to the two FC layers in turn to generate the spatial attention map Z _s ∈ R ^1×H×W :

其中，

和

为FC层的参数，Relu为Relu激活函数，

为经全局跨通道平均池层后的空间特征。in,

and

is the parameter of the FC layer, Relu is the Relu activation function,

is the spatial feature after global cross-channel average pooling layer.

通道注意力模型包括全局空间平均池层和FC(全连接)层The channel attention model includes a global spatial average pooling layer and an FC (full connection) layer

对于神经网络来说，只关注特征的空间响应是远远不够的。对于同一个特征图，每个通道的特征图的有用程度不同，因此引入通道注意模型。对于第一空间特征Z_t∈R^C×H×W，首先输入至一个全局空间平均池层，得到每个通道的整体响应

然后输入至FC(全连接)层计算得到通道注意图Z_c∈R^C×1×1，公式如下：For neural networks, it is not enough to only focus on the spatial response of features. For the same feature map, the usefulness of the feature map of each channel is different, so the channel attention model is introduced. For the first spatial feature Z _t ∈ R ^C×H×W , it is first input to a global spatial average pooling layer to obtain the overall response of each channel

Then input to the FC (full connection) layer to calculate the channel attention map Z _c ∈ R ^C×1×1 , the formula is as follows:

其中，W_c为全连接层的参数；

为经全局空间平均池层后的空间特征。Among them, W _c is the parameter of the fully connected layer;

is the spatial feature after the global spatial average pooling layer.

在得到特征的空间注意力图和通道注意力图之后，将空间注意力图和通道注意力图进行乘积操作，得到运动特征的整体注意力图。最后将整体注意力图输入sigmoid函数，将整体注意力图规范化为0到1之间的范围，更新门的输出Z如下所示：After the spatial attention map and channel attention map of the feature are obtained, the spatial attention map and the channel attention map are multiplied to obtain the overall attention map of the motion feature. Finally, the overall attention map is input to the sigmoid function, which normalizes the overall attention map to a range between 0 and 1, and the output Z of the update gate is as follows:

Z＝σ(Z_s⊙Z_c)Z=σ(Z _s ⊙Z _c )

最后经过RRU模块输出的运动特征S_i,k,t如下所示：Finally, the motion features S _{i, k, t} output by the RRU module are as follows:

S_i,k,t＝(1-Z)⊙S_i,k,t-1+Z⊙X_i,k,t S _i,k,t =(1-Z)⊙S _i,k,t-1 +Z⊙X _i,k,t

Z中每个空间位置的值表示当前帧特征X_i,k,t对应位置的激活值被保留的概率。概率值越高表明更新门认为该位置的当前特征的质量越高，应该保留的特征越多。反之，概率越低表明更新门认为该位置的当前特征的质量越低，应该使用上一帧的细化特征越多。该做法可以有效利用历史帧的有用运动信息，提高帧级别特征质量。The value of each spatial position in Z represents the probability that the activation value of the corresponding position of the current frame feature X _i,k,t is preserved. A higher probability value indicates that the update gate considers the current feature at that location to be of higher quality, and the more features it should keep. Conversely, a lower probability indicates that the update gate considers the current feature at that location to be of lower quality, and the more refined features from the previous frame should be used. This approach can effectively utilize the useful motion information of historical frames and improve the quality of frame-level features.

S3，利用GRU(门控循环单元网络)挖掘表情的时间信息。S3, use GRU (Gated Recurrent Unit Network) to mine the temporal information of expressions.

GRU(门控循环单元网络)的内部结构一共包括更新门、重置门、候选状态和输出门The internal structure of GRU (Gated Recurrent Unit Network) includes update gate, reset gate, candidate state and output gate.

GRU(门控循环单元网络)的前向传播公式如下所示：The forward propagation formula of GRU (Gated Recurrent Unit Network) is as follows:

z_t＝σ(W_z·[h_t-1,x_t])z _t =σ(W _z ·[h _t-1 ,x _t ])

r_t＝σ(W_r·[h_t-1,x_t])r _t =σ(W _r ·[h _t-1 ,x _t ])

其中，h_t-1表示上一时刻的输出，x_t表示当前时刻的输入，h_t表示当前时刻的输出，

表示当前时刻候选集，z_t,r_t分别表示为更新门和重置门。Among them, h _t-1 represents the output of the previous moment, x _t represents the input of the current moment, h _t represents the output of the current moment,

Represents the candidate set at the current moment, and z _t and r _t represent the update gate and the reset gate, respectively.

更新门用于控制前一时刻的状态信息被带入到当前状态中的程度，更新门的值越大说明前一时刻的状态信息带入越多。重置门控制前一状态有多少信息被写入到当前的候选集

上，重置门越小，前一状态的信息被写入的越少。这两个门控机制的特殊之处在于，它们能够保存长期序列中的信息，且不会随时间而清除或因为与预测不相关而移除。The update gate is used to control the degree to which the state information of the previous moment is brought into the current state. The larger the value of the update gate, the more the state information of the previous moment is brought in. The reset gate controls how much information from the previous state is written to the current candidate set

On the other hand, the smaller the reset gate, the less information from the previous state is written. What is special about these two gating mechanisms is that they preserve information in long-term series and do not clear over time or remove as irrelevant to prediction.

一种基于时空运动增强网络的动态表情识别系统，所述系统包括：A dynamic expression recognition system based on a spatiotemporal motion enhancement network, the system comprising:

一种基于时空运动增强网络的动态表情识别系统，所述系统包括处理器和存储介质；A dynamic expression recognition system based on spatiotemporal motion enhancement network, the system includes a processor and a storage medium;

计算机可读存储介质，其上存储有计算机程序，其特征在于，该程序被处理器执行时实现上述所述方法的步骤。A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the steps of the above-mentioned method are implemented.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

以上仅为本发明的实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均包含在申请待批的本发明的权利要求范围之内。The above are only examples of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention are included in the application for pending approval of the present invention. within the scope of the claims.

Claims

1. a dynamic expression recognition method, is characterized in that, comprises the following steps:

Obtain a video sequence containing facial expressions;

Inputting the video sequence into the Resnet-Emotion network to obtain the spatial features of expressions;

Inputting the spatial feature into the recursive refining unit network to obtain the motion feature of the expression;

The motion features are input into the gated cyclic unit network to obtain the temporal features of expressions, and the recognition of dynamic expressions is completed.

2. a kind of dynamic expression recognition method according to claim 1, is characterized in that, the acquisition process of described motion feature is as follows:

Obtain the original spatial feature of the current frame in the video sequence, the original spatial feature of the previous frame, and the motion feature output by the recursive refining unit network of the previous frame;

The original spatial features of the current frame, the original spatial features of the previous frame, and the motion features output by the recursive refining unit network of the previous frame are simultaneously input to the recursive refining unit network, and after being processed by the update gate model in the recursive refining unit network, we get overall attention map;

The overall attention map is input to the sigmoid function, and the motion feature of the current frame is output by the recursive refining unit network.

3. a kind of dynamic expression recognition method according to claim 2, is characterized in that, described update gate model comprises transition layer, spatial attention model and channel attention model; Described space attention model, channel attention model are connected to the transition layer.

4. a kind of dynamic expression recognition method according to claim 3, is characterized in that, the processing procedure of described updating gate model is as follows:

The original spatial feature of the current frame, the original spatial feature of the previous frame, and the motion feature output by the recursive refining unit network of the previous frame are simultaneously input to the transition layer, and the first spatial feature is generated through the transition layer;

Inputting the first spatial feature into a spatial attention model, and obtaining a spatial attention map through the spatial attention model;

Inputting the first spatial feature into the channel attention model, and obtaining the channel attention map through the channel attention model;

The overall attention map is obtained by multiplying the spatial attention map with the channel attention map.

5. A kind of dynamic expression recognition method according to claim 3, is characterized in that, described transition layer comprises the convolution layer, batch normalization layer and ReLU activation function that are set in sequence; Described spatial attention model comprises sequentially A set global cross-channel average pooling layer, a first fully connected layer, and a second fully connected layer; the channel attention model includes a global spatial average pooling layer and a fully connected layer set in sequence.

6. a kind of dynamic expression recognition method according to claim 5, is characterized in that, described space attention map is:

where Z _s is the spatial attention map, W _s ¹ and

is the spatial feature after the global cross-channel average pooling layer;

The channel attention map is:

Among them, Z _c is the channel attention map; W _c is the parameter of the fully connected layer;

is the spatial feature after the global spatial average pooling layer.

7 . The dynamic expression recognition method according to claim 1 , wherein the gated cyclic unit network comprises an update gate, a reset gate, a candidate state and an output gate which are set in sequence. 8 .

8. a kind of dynamic expression recognition method according to claim 7, is characterized in that, described update gate is:

z _t =σ(W _z ·[h _t-1 ,x _t ])

The reset gate is:

r _t =σ(W _r ·[h _t-1 ,x _t ])

Among them, z _t , r _t represent the update gate and the reset gate respectively; h _t-1 represents the output of the previous moment, x _t represents the input of the current moment, W _r represents the weight of the reset gate parameters, and σ represents the sigmoid function.

9. A dynamic expression recognition system, wherein the system comprises:

Acquisition module: used to acquire video sequences containing facial expressions;

The first input module: used to input the video sequence into the Resnet-Emotion network to obtain the spatial features of expressions;

The second input module: for inputting the spatial feature into the recursive refining unit network to obtain the motion feature of the expression;

The third input module is used for inputting the motion features into the gated recurrent unit network to obtain the temporal features of expressions and complete the recognition of dynamic expressions.

10. A dynamic expression recognition system, wherein the system comprises a processor and a storage medium;

the storage medium is used for storing instructions;

The processor is adapted to operate in accordance with the instructions to perform the steps of the method according to any of claims 1-8.