CN111695523A

CN111695523A - Double-current convolutional neural network action identification method based on skeleton space-time and dynamic information

Info

Publication number: CN111695523A
Application number: CN202010539760.5A
Authority: CN
Inventors: 王洪雁; 张鼎卓; 袁海; 汪祖民
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2020-09-22
Anticipated expiration: 2040-06-15
Also published as: CN111695523B

Abstract

The invention discloses a double-current convolutional neural network action recognition method based on skeleton space-time and dynamic information, which belongs to the field of computer visual image and video processing and is used for solving the problem of low recognition rate of the action recognition method based on skeleton information in a complex scene, and has the key points that: (1) inputting a skeleton sequence, and performing coordinate system conversion on the obtained skeleton sequence; (2) constructing a skeleton space-time characteristic diagram and a joint movement velocity diagram based on the converted coordinate information; (3) respectively enhancing the characteristics of a skeleton space-time characteristic diagram and a joint motion velocity diagram based on motion significance and morphological operators; the motion classification is realized based on the bone space-time characteristic diagram and the joint motion velocity diagram which are subjected to the deep fusion and enhancement of the double-flow convolutional neural network, and the effect is that the motion identification accuracy can be effectively improved for complex scenes with visual angle change, rich noise, subtle difference motion and the like.

Description

Double-current convolutional neural network action identification method based on skeleton space-time and dynamic information

Technical Field

The invention belongs to the field of computer visual image and video processing, and relates to a motion identification method based on skeleton space-time and dynamic characteristics and combined with a double-flow convolutional neural network (Two-Stream-CNN, TS-CNN).

Background

As a research hotspot in the field of computer vision, human body action recognition has important application value in the fields of intelligent monitoring, human-computer interaction, video retrieval and the like. The following technical difficulties are mainly faced: such methods are less robust due to factors such as illumination variations and cluttered backgrounds. The depth image information redundancy is high, so that the algorithm computation complexity is increased, and the practical application of the method is limited. And because the original bone information captured by the depth sensor contains noise and joint space-time information is fuzzy, how to effectively extract motion information through three-dimensional bone data to identify human body actions still faces huge challenges. The identification action method based on the manually extracted features has single extracted features, so the identification precision is limited and the universality is poor; based on good time sequence modeling capability of the RNN, an action recognition model is constructed by using the RNN, however, the RNN cannot effectively express an inter-joint space domain relation; based on the powerful spatial domain feature extraction capability of the CNN, the CNN is utilized to extract action features from a framework sequence coding image, each joint information is independently coded into a color image, and the traditional method mainly solves the following problems when the framework sequence is coded into a color texture map: first, each joint information is independently encoded as a color image, but the inter-joint related information is ignored; secondly, space constraint between joints is ignored, so that joint airspace information is disordered, and the recognition accuracy is limited; finally, only the static characteristics of the joints are concerned, the dynamic characteristics of the joints are ignored, and the different participation degrees of the joints in motion completion are not considered, so that the motion information coding is incomplete, the joint airspace significance information is lost, and the motion recognition rate is limited.

Disclosure of Invention

In order to solve the problems, the invention provides a double-current convolutional neural network action recognition method based on skeleton space-time and dynamic information, which can solve the problem of low recognition rate of the action recognition method based on skeleton information in a complex scene.

The invention adopts the following technical scheme: a double-current convolutional neural network action recognition method based on skeleton space-time and dynamic information comprises the following steps:

(1) inputting a skeleton sequence, and performing coordinate system conversion on the obtained skeleton sequence.

(2) Constructing a skeleton space-time characteristic diagram and a joint motion velocity diagram based on the converted coordinate information, which specifically comprises the following steps:

and (2.1) coding the relative coordinates and the absolute coordinates of the joints into a skeleton space-time characteristic diagram based on human body structure constraint.

And (2.2) coding joint velocity information at the same time step into a joint motion velocity map.

(3) And respectively enhancing the characteristics of the skeleton space-time characteristic diagram and the joint motion velocity diagram based on the motion significance and the morphological operator.

(4) And realizing action classification based on the enhanced bone space-time characteristic diagram and the joint motion velocity diagram of the deep fusion of the double-flow convolutional neural network.

Further, the step (1) is specifically as follows:

the skeleton sequences captured by the depth sensor are all located in a Cartesian coordinate system with the camera as an origin, and coordinate system conversion is carried out on skeleton three-dimensional coordinates to obtain a body coordinate system for effectively representing airspace information, wherein the method comprises the following steps:

constructing a body coordinate system taking a hip joint with a small motion amplitude as a primary center, and for a video sequence with N joint points and F frames, converting joint coordinates into the following components:

wherein ,

respectively the coordinate information of the joint j of the f-th frame before and after the transformation of the coordinate system,

coordinate information of the ith frame of the hip joint.

Further, the step (2) is specifically as follows:

in the step (2.1), joint absolute coordinates and relative coordinates between joints are jointly coded into a color texture map to form a skeleton space-time characteristic map representing motion space-time domain characteristics, and the method comprises the following steps:

bone sequence based on coordinate transformation

The joint relative position is obtained by:

wherein ,

is the three-dimensional coordinate of the jth joint relative to the ith joint in the f frame, and is represented as the spatial information of the jth and ith joint connected bones, when i is 1,

is the absolute coordinate of the jth joint,

space-time characteristic of j-th joint is represented by matrix Q_{j_i}Expressed as:

only the first-level and second-level relevant information with higher relevance is selected, and the first-level and second-level relevant information are respectively shown as the following formulas:

R₁＝[Q_{h_k},Q_{j_i},…,Q_{m_n}],R₂＝[Q_{p_o},Q_{u_v},…,Q_{y_x}](4)

wherein, h, k; j, i; m, n represent a pair of joints connected by only one edge, p, o; u, v; y, x represent a pair of joints connected by two sides.

Arranging coordinate information according to body structures, and dividing all joints of the body into the following five groups: the left arm, the right arm, the left leg, the right leg and the trunk are arranged according to the physical connection sequence among joints, and therefore a skeleton space-time characteristic diagram obtained by the coding sequence is as follows:

wherein k is the motion category, A is the absolute coordinate of the joint point,

order to

The three-dimensional coordinates correspond to R, G, B three channels respectively, and the skeleton space-time characteristic E is obtained_kconverted to a 72 xf bone space-time profile.

In the step (2.2), velocity information of each joint is extracted to represent motion dynamic characteristics, a feature descriptor representing joint motion characteristics is constructed based on velocity scalar quantity information, and velocity values of the joint in the x, y and z directions in the frame are represented as follows:

wherein ,

the three-dimensional coordinate value of the joint in the f + delta f frame. Δ f is the time step, Δ t is:

where the FPS is the frame rate of the camera employed.

V is to be_x、v_y、v_zcorresponding to R, G, B, respectively, the coded joint motion information is an N × (F-delta F) dimensional joint motion velocity map.

Further, the step (3) is specifically as follows:

(3.1) enhancing joint airspace information with obvious movement characteristics in the skeleton space-time characteristic diagram based on motion energy, which specifically comprises the following steps:

during the class k action sequence, the coordinates are

The instantaneous energy of the joint i in the f-th frame is:

wherein f is greater than 1. I | · | | represents the euclidean distance, and the motion energy of the joint i in the whole motion sequence is:

based on kinetic energy

Ith Joint color weight

Can be obtained by the following formula:

in the formula ,

the maximum and minimum values of all joint motion energy during the class k motion sequence, respectively.

According toThe coding sequence is to weight the color of all joints in the k-th action to omega^kCoding as a motion enhancement weight:

the enhanced bone space-time feature map is shown as:

(3.2) enhancing the joint movement speed characteristic pattern texture information based on a morphological operator to improve the speed estimation performance, wherein the method comprises the following steps:

firstly, carrying out corrosion operation on the joint motion velocity diagram to eliminate noise as follows:

wherein, X is a binary image, theta represents corrosion operation, and E is a structural element. The formula (12) is used for the speed values v of the joints in the x, y and z directions in the frame f obtained in the step (2.2)_x、v_y、v_zAnd (3) carrying out corrosion operation:

I_v＝[v_xΘE v_yΘE v_zΘE](13)

wherein I_vVelocity diagram showing joint movement after erosion

And performing expansion operation on the corroded image:

in the formula ,J_vA graph showing the joint motion velocity after erosion and expansion, theta shows erosion operation,

indicating the dilation operation.

Further, the step (4) is specifically as follows:

the double-current convolutional neural network model is an AlexNet model, the number of neurons in the first layer, the third layer and the fourth layer of the AlexNet model is 64, 256 and 256 respectively, a bone space-time characteristic diagram and an articulation velocity diagram are used as input of dynamic and static currents respectively, and after the bone space-time characteristic diagram and the articulation velocity diagram are processed through a convolutional layer, a pooling layer and a full connection layer, posterior probabilities generated by single-current CNN are fused into a final recognition result.

Further, a bone space-time characteristic diagram and a joint movement velocity diagram are respectively used as input of dynamic and static flows, and after being processed by a convolutional layer, a pooling layer and a full-link layer, the posterior probability generated by the single-flow CNN is fused into a final recognition result, and the method comprises the following steps:

given the bone sequence S_mprocessing to respectively obtain a bone space-time characteristic diagram and a joint movement velocity diagram, scaling the two to 227 × 227 pixels through bilinear interpolation to be beneficial to subsequent depth characteristic extraction, outputting the depth characteristic extracted based on CNN to the last layer of full-connected layer, and then normalizing the depth characteristic by a Softmax function to obtain the posterior probability:

wherein ,

as an image of the mth bone sequence

The probability of belonging to the nth class of action,

the input of the nth neuron of the last full connection layer is shown, x represents a skeleton space-time characteristic diagram or a joint movement velocity diagram, and N is the number of action categories.

The double-current convolution neural network model outputs n signals at a time

And

multiply fusing is applied to each stream output to obtain the final classification result:

ActionClass＝Fin(Max(P_SSTM⊙P_JMSM)) (16)

wherein Fin (-) is a maximum label function, Max (-) is a maximum operator, ⊙ is a Hadamard product operator, SSTM represents a bone space-time characteristic diagram, JMSM represents a joint motion velocity diagram, P_SSTMOutput value, P, for static flow softmax_JMSMAnd outputting a value for the dynamic stream softmax, wherein the value is respectively represented as:

has the advantages that: the invention is based on the action recognition of space-time and dynamic characteristics, and transforms the coordinate system of each type of action; constructing descriptors of skeleton space-time characteristics and motion characteristics; joint space domain information with obvious movement characteristics in the skeleton space-time characteristic diagram is enhanced, and a joint motion velocity diagram is enhanced by using a morphological operator to eliminate noise; and realizing action classification based on the enhanced bone space-time characteristic diagram and the joint motion velocity diagram of the deep fusion of the double-flow convolutional neural network. In the invention, because a relatively stable joint is selected as a coordinate origin to transform a skeleton sequence coordinate system, the obtained body coordinate system can effectively represent the related information between joints, and a skeleton space-time characteristic diagram is constructed by using the related information; body structure constraint is added when a skeleton sequence is coded, so that the recognition rate among different types of actions is greatly improved; in addition, after the dynamic skeleton information is added, the motion characteristic information is more comprehensively represented, so that the overall recognition rate of the invention is obviously improved; and finally, the difference between similar actions is reduced by enhancing the motion significance, and the error recognition rate between similar actions is reduced. Compared with the mainstream human body action identification method, the method has higher identification rate under the complex scenes of visual angle change, noise, main body diversity, similar action diversity and the like.

Drawings

FIG. 1 is a schematic flow chart of the main framework of the method of the present invention.

Fig. 2 shows the bone coordinates of the Kinect coordinate system.

Fig. 3 is a body coordinate system joint visualization.

Figure 4 shows a set of 25 human joints.

FIG. 5 is a joint distance graph and a proposed skeleton space time feature graph: a1 of FIG. 5 is a joint distance map; a2 in fig. 5 is a skeleton space time characteristic diagram.

FIG. 6 is an image enhanced color texture map: b1 in fig. 6 shows the motion enhancement of the skeleton space-time characteristic diagram; b2 of fig. 6 is the articulation velocity map visual enhancement.

Fig. 7 is a model of a dual-stream convolutional neural network.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

In the invention, the flow of the double-current convolutional neural network action identification method based on skeleton space-time and dynamic information is shown in the attached figure 1, and the implementation steps are as follows:

(1) carrying out coordinate system conversion on the skeleton sequence to obtain a body coordinate system taking the hip joint as a coordinate origin;

skeleton sequences captured by a depth sensor such as Kinect are all located in a cartesian coordinate system with a camera as an origin, and as shown in fig. 2, a body coordinate system for obtaining effective representation airspace information needs to perform coordinate system conversion on skeleton three-dimensional coordinates, which specifically includes:

and constructing a body coordinate system taking the hip joint with smaller motion amplitude as a primitive center. For a video sequence with N joint points, F frames, the joint coordinate transformation can be expressed as:

wherein ,

coordinate information of the ith frame of the hip joint. Post-transform joint visualization is shown in fig. 3.

(2) Constructing a skeleton space-time characteristic diagram and a joint movement velocity diagram based on the converted coordinate information;

step (2.1): the relative coordinates between joints and the absolute coordinates of the joints are jointly coded into a color texture map to form a skeleton space-time characteristic map representing the space-time domain characteristics of actions, and the method comprises the following steps:

bone sequence based on coordinate transformation

The joint relative position can be obtained by:

wherein ,

and the three-dimensional coordinates of the jth joint relative to the ith joint in the f frame are represented, and the spatial information of the j and i joint connected bones is also represented. Further, when i is 1,

as absolute coordinates of the j-th joint, i.e.

Based on the above, the space-time characteristics of the j-th joint can be represented by the matrix Q_{j_i}Expressed as:

in the invention, only the first and second (namely only one or two joint pairs with connected edges) level related information with higher correlation degree is selected, so that the calculation complexity is reduced, the inter-class confusion is reduced, and the intra-class robustness is improved. The first and second levels of relevant information are respectively shown as follows:

R₁＝[Q_{h_k},Q_{j_i},…,Q_{m_n}],R₂＝[Q_{p_o},Q_{u_v},…,Q_{y_x}](21)

wherein, h, k; j, i; m, n, etc. represent joint pairs connected by only one side, such as left wrist and left elbow, left ankle and left knee, p, o; u, v; y, x, etc. represent joint pairs connected by two sides, such as left wrist and shoulder, left foot and knee, etc.

Since the sensitivity area of CNN increases with the depth of the network, the spatial information between the joint pairs with high correlation should be extracted in a shallow layer, and the spatial information with low correlation should be acquired in a deep layer. The proposed joint distance map, as shown in a1 of fig. 5, arranges joint information in a color image in a fixed order while ignoring the difference in relative spatial information, arranges coordinate information in accordance with a body structure, and divides all joints into the following five groups: left arm, right arm, left leg, right leg, torso, each group is arranged according to the physical connection order between joints, as shown in fig. 4. Taking the right arm as an example, the joint points [25,24,12,11,10,9] are adjacent in fig. 4, so that the correlation degree is higher, and the spatial relationship between the joint points can be more effectively extracted by grouping the joint points into a group. Based on the above, the resulting bone space-time feature map can effectively encode the space-time domain information of the joint, as shown in a2 of fig. 5.

The bone space-time characteristic diagram obtained based on the coded bone sequence is as follows:

order to

The three-dimensional coordinates correspond to R, G, B three channels respectively, so that the space-time characteristics E of the skeleton can be obtained_kconverted to a 72 xf bone space-time profile.

And (2.2) extracting the speed information of each joint to represent the dynamic characteristics of the motion, and constructing a characteristic descriptor representing the motion characteristics of the joint based on the speed scalar information. The velocity values of the joint in x, y, z in f frames can be expressed as follows:

wherein ,

the three-dimensional coordinate value of the joint in the f + delta f frame; Δ f is the time step, Δ t is:

wherein FPS is the frame rate of the Kinect camera employed.

V is to be_x、v_y、v_zcorresponding to R, G, B, respectively, the joint motion information can be encoded as an N × (F- Δ F) dimensional joint motion velocity map.

(3) Based on the method for respectively enhancing the characteristics of the skeleton space-time characteristic diagram and the joint motion velocity diagram based on the motion significance and the morphological operator, the inter-class difference of different actions is improved, and the intra-class difference of the same actions is reduced;

during the kth motion sequence, the coordinates are

The instantaneous energy of the joint i in the f-th frame is:

wherein f is greater than 1; i | · | | represents the euclidean distance. From this, the motion energy of the joint i in the whole motion sequence is:

based on kinetic energy

Ith Joint color weight

Can be obtained by the following formula:

in the formula ,

the maximum and minimum values of all joint motion energy during the kth motion sequence, respectively.

According to the coding sequence, all joint color weights omega in the k-th action^kCoding as a motion enhancement weight:

the enhanced bone spatiotemporal feature map may be represented as:

as shown in b1 of fig. 6, the color corresponding to the joint related information with high motion energy is enhanced, and the color information of the joint with low motion energy is blurred, so that the motion classification capability can be improved by adopting the proposed adaptive enhancement mode to make the skeleton space-time feature map have motion saliency features.

And (3.2) enhancing the motion characteristic map texture information based on a morphological operator to improve the speed estimation performance. The method firstly carries out corrosion operation on the joint motion velocity diagram to eliminate noise, namely:

wherein, X is a binary image, theta represents corrosion operation, and E is a structural element.

V obtained by the reaction of formula (12) on step (2.2)_x、v_y、v_zCarrying out corrosion operation, namely:

I_v＝[v_xΘE v_yΘE v_zΘE](30)

and performing expansion operation on the corroded image to restore and smooth the original texture so as to effectively reduce the intra-class speed difference. Adding expansion operation to obtain:

wherein I_vA graph showing joint motion velocity after erosion;

indicating the dilation operation.

As shown in b2 of fig. 6, compared to the original image (first line), the texture of the enhanced image (second line) is smoother, and under the condition that the original texture is kept basically unchanged, useless information is effectively removed, so that the difference between similar actions is reduced.

(4) Constructing a bone space-time characteristic diagram and a joint motion velocity diagram based on the deep fusion and enhancement of the double-current convolution neural network to realize action classification;

the dual-current convolutional neural network model is composed of two improved alexnets, and as shown in fig. 7, the number of neurons in the first layer, the third layer and the fourth layer of the AlexNet is changed from 96, 384 and 384 to 64, 256 and 256, respectively, so as to form the dual-current convolutional neural network model in the invention.

And respectively taking the skeleton space-time characteristic diagram and the joint movement velocity diagram as the input of the dynamic and static flows, and fusing the posterior probabilities generated by the single flow CNN into a final recognition result after the processing of the convolutional layer, the pooling layer and the full-link layer.

Given the bone sequence S_mthe bone space-time feature map and the joint movement velocity map can be obtained through the processing, and the two maps are scaled to 227 × 227 pixels through bilinear interpolation so as to be beneficial to subsequent depth feature extraction, the depth features extracted based on the CNN are output to the last layer of fully-connected layer, and then are normalized by a Softmax function, and the posterior probability can be obtained as follows:

wherein ,

as an image of the mth bone sequence

The probability of belonging to the nth class of action,

In the proposed model, n outputs are output at a time

And

ActionClass＝Fin(Max(P_SSTM⊙P_MSM)) (34)

wherein Fin (-) is the maximum label function, Max (-) is the maximum operator, as Hadamard product operator, SSTM indicates the boneSpace-time characteristic diagram, JMSM representing the velocity diagram of joint movement, P_SSTMOutput value, P, for static flow softmax_JMSMAnd outputting a value for the dynamic stream softmax, wherein the value is respectively represented as:

the invention relates to a double-current convolution neural network action identification method based on skeleton space-time and dynamic information, which comprises the steps of firstly transforming a skeleton three-dimensional coordinate system to obtain coordinate information containing relative positions of joints; secondly, coding the related information among joints into a color texture map to construct a skeleton space-time feature descriptor, and considering the physical structure constraint of a human body to increase the difference among classes; then, estimating the velocity information of each joint, and coding the velocity information into a color texture map to obtain a skeleton motion characteristic descriptor; in addition, the obtained space-time and dynamic characteristics are respectively enhanced based on the motion significance and the morphological operator so as to further improve the characteristic expression capability; and finally, the enhanced bone space-time and dynamic characteristics are deeply fused through a double-flow convolutional neural network to realize action recognition. Aiming at complex scenes with visual angle change, rich noise, subtle difference action and the like, the method can effectively improve the action recognition accuracy.

The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims

1. A double-current convolutional neural network action recognition method based on skeleton space-time and dynamic information is characterized by comprising the following steps: the method comprises the following steps:

(1) inputting a skeleton sequence, and performing coordinate system conversion on the obtained skeleton sequence;

(2.1) coding the relative coordinates and the absolute coordinates of the joints into a skeleton space-time characteristic diagram based on human body structure constraint;

(2.2) encoding joint velocity information at the same time step into a joint motion velocity map;

(3) respectively enhancing the characteristics of a skeleton space-time characteristic diagram and a joint motion velocity diagram based on motion significance and morphological operators;

2. The method for identifying the action of the double-current convolutional neural network based on the skeletal spatiotemporal and dynamic information according to claim 1, wherein the method comprises the following steps: the step (1) is specifically as follows:

the skeleton sequences captured by the depth sensor are all located in a Cartesian coordinate system with the camera as an origin, and coordinate system conversion is carried out on skeleton three-dimensional coordinates to obtain a body coordinate system for effectively representing airspace information, wherein the method comprises the following steps: :

wherein ,

coordinate information of the ith frame of the hip joint.

3. The method for identifying the action of the double-current convolutional neural network based on the skeletal spatiotemporal and dynamic information according to claim 1, wherein the method comprises the following steps: the step (2) is specifically as follows:

bone sequence based on coordinate transformation

The joint relative position is obtained by:

wherein ,

is the absolute coordinate of the jth joint,

R₁＝[Q_{h_k},Q_{j_i},…,Q_{m_n}],R₂＝[Q_{p_o},Q_{u_v},…,Q_{y_x}](4)

wherein, h, k; j, i; m, n represent a pair of joints connected by only one edge, p, o; u, v; y, x represent a pair of joints connected by two edges;

order to

The three-dimensional coordinates correspond to R, G, B three channels respectively, and the skeleton space-time characteristic E is obtained_kconverting into a 72 XF bone space-time characteristic diagram;

wherein ,

in the formula, FPS is the frame rate of the adopted camera;

v is to be_x、v_y、v_zRespectively correspond to R, G, B, compilethe code joint motion information is an N (F-delta F) dimension joint motion velocity map.

4. The method for identifying the action of the double-current convolutional neural network based on the skeletal spatiotemporal and dynamic information as claimed in claim 3, wherein: the step (3) is specifically as follows:

during the class k action sequence, the coordinates are

The instantaneous energy of the joint i in the f-th frame is:

wherein f is greater than 1; i | · | | represents the euclidean distance, and the motion energy of the joint i in the whole motion sequence is:

based on kinetic energy

Ith Joint color weight

Can be obtained by the following formula:

in the formula ,

respectively the maximum value and the minimum value of all joint motion energy in the k-th motion sequence period;

according to the coding sequence, weighting all joint colors in the k-th action to omega^kCoding as a motion enhancement weight:

the enhanced bone space-time feature map is shown as:

wherein X is a binary image, theta represents corrosion operation, and E is a structural element; the formula (12) is used for the speed values v of the joints in the x, y and z directions in the frame f obtained in the step (2.2)_x、v_y、v_zAnd (3) carrying out corrosion operation:

I_v＝[v_xΘE v_yΘE v_zΘE](13)

wherein I_vVelocity diagram showing joint movement after erosion

And performing expansion operation on the corroded image:

indicating the dilation operation.

5. The method for identifying the action of the double-current convolutional neural network based on the skeletal spatiotemporal and dynamic information according to claim 1, wherein the method comprises the following steps: the step (4) is specifically as follows:

6. The method for identifying the action of the double-current convolutional neural network based on the skeletal spatiotemporal and dynamic information as claimed in claim 4, wherein: respectively taking a skeleton space-time characteristic diagram and a joint movement velocity diagram as the input of dynamic and static flows, and fusing the posterior probability generated by the single flow CNN into a final recognition result after the processing of a convolutional layer, a pooling layer and a full-link layer, wherein the method comprises the following steps of:

wherein ,

as an image of the mth bone sequence

The probability of belonging to the nth class of action,

represents the last oneInputting the nth neuron of the layer full connection layer, wherein x represents a skeleton space-time characteristic diagram or a joint movement velocity diagram, and N is the number of action categories;

the double-current convolution neural network model outputs n signals at a time

And

ActionClass＝Fin(Max(P_SSTM⊙P_JMSM)) (16)