CN109460707A - A kind of multi-modal action identification method based on deep neural network - Google Patents
A kind of multi-modal action identification method based on deep neural network Download PDFInfo
- Publication number
- CN109460707A CN109460707A CN201811165862.4A CN201811165862A CN109460707A CN 109460707 A CN109460707 A CN 109460707A CN 201811165862 A CN201811165862 A CN 201811165862A CN 109460707 A CN109460707 A CN 109460707A
- Authority
- CN
- China
- Prior art keywords
- layer
- video
- neural network
- deep neural
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
- G06V40/25—Recognition of walking or running movements, e.g. gait recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of multi-modal action identification method based on deep neural network, this method comprehensively utilize the multi-modal informations such as video image, light stream figure and human skeleton, the specific steps are as follows: firstly, carrying out series of preprocessing and compression to video;Consecutive frame based on video obtains light stream figure;Using Attitude estimation algorithm, human skeleton is acquired in frames from video, and calculates the path integral feature of frame sequence;It is input to resulting light stream figure, skeleton path integral feature and original video image in the deep neural network with multiple-branching construction, enables its study characterize about the abstract space-time of human action, and correctly judge its action classification.In addition, also having accessed the pooling layer based on attention mechanism in video image branch, strengthen and the closely bound up abstract characteristics of final classification of motion result, reduction unrelated interruptions.The present invention comprehensively utilizes multi-modal information, has many advantages, such as that strong robustness is strong, discrimination is high.
Description
Technical field
The present invention relates to technical field of image processing, and in particular to a kind of multi-modal movement knowledge based on deep neural network
Other method.
Background technique
Action recognition is a recently very popular research direction, passes through the movement of human body in identification video, Ke Yizuo
Processing equipment is interactively entered for a kind of new, the application field of the everyday exposures such as game, video display can be widely used in
In.Action recognition task is related to identifying different movements from video clip, and possibly through entire video, this is for movement therein
A kind of natural expansion of image classification task, i.e., carry out image recognition in multi-frame video, then calculates from each frame
The prediction result finally acted.
Traditional video actions identification technology tends to rely on the feature extractor of hand-designed, to extract the space-time of movement
Feature.With being announced to the world splendidly for deep learning, this kind of craft feature extractor is substituted by depth convolutional neural networks.
Although deep learning frame achieves success, visual classification and expression in the field image classification (ImageNet)
The framework of learning areas is made slow progress.Mainly huge calculating cost, the two-dimensional convolution nerve net of simple 101 classification
Network only has about 5M parameter, and it is about 33M parameter that same architecture, which expands to a three-dimensional structure to increase,.On UCF101
One Three dimensional convolution neural network (3DConvNet) of training needs 3 to 4 days time, and about 2 are needed on Sports-1M
Month, this becomes difficult extension framework probes and may overfitting.
Action recognition is related to capturing the space-time context across frame, in addition, the spatial information captured needs the auxiliary of hardware device
It helps, is generally compensated using for camera movement, and even if having very strong spatial object detectability that can not meet
The demand of movement, because more detailed minutia entrained by motion information is mined out not yet.In order to more pre-
It surveys, needs to capture the motion information of local context in video, while capturing the motion information of global context.
Nowadays video actions identification technology has used depth learning technology completely, wherein classical work is double-current convolution mind
Through network.The it is proposed of double-current convolutional neural networks is actually to have used for reference the double-current access that information is handled in brain vision system,
Wherein: veutro access (what path way) is for handling the spatial informations such as shape, the color of object;Back side (where path
Way) access for handle to move, the relevant information in position.Although this method is mentioned by obviously capturing local time's movement
The performance for having risen single stream method, since the prediction of video level is obtained by the prediction score of average sample editing, in institute
In the feature of study, medium-term and long-term temporal information is still loss.Therefore, there are also can much mention double-current video frequency identifying method
The space risen.
Summary of the invention
The purpose of the present invention is to solve drawbacks described above in the prior art, provide a kind of based on deep neural network
Multi-modal action identification method.This method increases this mode of human skeleton on double-current convolutional network network foundation.Due to
Human body attitude estimates that relatively low (human skeleton, that is, key point has a very strong correlation relationship to difficulty, therefore can be in combination with
The clue of bottom-up and top-down is positioned), and have mature Open Framework such as AlphaPose etc., it is introduced into
Into action recognition, the interference of extraneous background on the one hand can be eliminated, on the other hand, the human motion of frame sequence meticulous depiction
When each key point position situation of change, be conducive to the identification of movement.This method uses the depth mind with multiple-branching construction
Multi-modal action recognition is carried out through network, wherein image branch is for handling the spatial informations such as shape, the color of object;Light stream branch
Road for handle to move, the relevant information in position;Skeleton branch is reached pair by the path integral feature of processing frame sequence
The meticulous depiction of movement.In addition, the invention also introduces a kind of pond method based on attention mechanism in image branch, so that
Lime light can be placed on the area-of-interest closely bound up with action classification by image branch automatically, further increase action recognition
The accuracy of method.
The purpose of the present invention can be reached by adopting the following technical scheme that:
A kind of multi-modal action identification method based on deep neural network, the action identification method include the following steps:
S1, pass through the disclosed database of acquisition, every frame image of video data is converted into RGB picture set, name rule
Data separation then is carried out as filename according to video name+time+movement id, data construct training according to the ratio of 3:1 here
Set and test set, wherein movement id includes following six kinds of basic movements: walking, run, wave, bend over, jump, stand.
S2, the data acquisition system unified resolution that step S1 is obtained.
S3, data compression is carried out to step S2 treated sets of image data, reduces calculation amount, i.e., using image from
Cosine transform is dissipated to compress the Pixel Information of every frame video pictures.
S4, to step S3 treated video data according to time dimension, by video of the time interval in interval threshold
The video frame deletion of frame or picture similarity more than similar threshold value.
S5, to the Optic flow information of step S4 treated data extract N number of successive video frames, wherein N is more than or equal to 10
Positive integer.
S6, using open source Attitude estimation algorithm such as AlphaPose etc., human skeleton is extracted by frame to video, thus
Path integral feature is sought to a frame sequence, and to the frame sequence.
S7, by Optic flow information that step S5 is extracted, step S6 the human skeleton path integral feature extracted and
Step S4 treated several video images, the input as deep neural network.There are three deep neural network is total in low layer
Branch --- one is used for the convolutional neural networks of extraction time feature, and one for extracting the convolutional Neural net of space characteristics
Network, one for handling the fully-connected network of skeleton path integral feature.In high level, three branches of low layer are melted by feature
A branch is merged into conjunction, and the classification id of video actions is predicted by softmax activation primitive.
Further, the database of step S1 acquisition mainly includes KTH human body behavior database, UCF Sports data
Library.
Further, step S2 is unified to 120*90 by video image resolution ratio.
Further, step S3 carries out discrete cosine transform to every frame image of video data, by transformed DCT
Coefficient carries out thresholding operation, and the coefficient for being less than certain threshold value is zeroed, so that compression ratio is 10:1, then carries out inverse DCT fortune
It calculates, obtains the single-frame images of compressed video data.
Further, the similar variation of the video frame for being spaced in 500ms is greater than 70% view according to time dimension by step S4
Frequency frame deletion reduces redundancy.Wherein, the value range of the interval threshold of time interval is 400ms-1000ms, representative value
For 500ms.The value range of the similar threshold value of picture similarity is 0.5-0.9, representative value 0.7.
Further, the Optic flow information that the processing data of step S5 are extracted with 10 successive video frames, mainly passes through
Lucas-Kanade algorithm solves basic optical flow equation to all pixels in neighborhood using least square principle, finally obtains required
Optic flow information.
Further, step S6 extracts human body by frame to video using open source Attitude estimation algorithm such as AlphaPose etc.
Skeleton to obtain a frame sequence, and seeks path integral feature to the frame sequence.
Further, the human skeleton road that step S7 extracts Optic flow information that step S5 is extracted, step S6
Diameter integrates feature and step S4 treated several video images, is input in deep neural network.The deep neural network
Network structure is as follows:
There are three branches, i.e. image branch, light stream branch and skeleton branch in low layer tool for the deep neural network, right respectively
The input of Ying Yusan mode;Three branches of low layer are merged into a branch by Fusion Features in high level, wherein image
Branch uses convolutional neural networks, is sequentially connected from input layer to output layer are as follows: convolutional layer conv1, pond layer pooling1, volume
Lamination conv2, pond layer pooling2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5, pond layer attention
Pooling, full articulamentum fc6, full articulamentum fc7, full articulamentum fc8, data aggregation layer fusion, loss function layer loss;
Light stream branch uses convolutional neural networks, is sequentially connected from input layer to output layer are as follows: convolutional layer conv1, Chi Hua
Layer pooling1, convolutional layer conv2, pond layer pooling2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5, pond
Change layer pooling5, full articulamentum fc6, full articulamentum fc7, full articulamentum fc8, data aggregation layer fusion, loss function layer
loss;
Skeleton branch use fully-connected network, be followed successively by from input layer to output layer full articulamentum fc1, Quan Lian stratum fc2,
Data aggregation layer fusion, loss function layer loss.
Further, the data aggregation layer fusion acts input video by softmax activation primitive and carries out
Classification, and optimize the parameter of network by minimizing Classification Loss function.
Further, in the step S7 image branch pool layers of introducing attention mechanism of attention, will roll up
Two groups of construction of the parameter input weight vectors for study after product, respectively the conspicuousness weight vectors b from bottom-up
With the attention weight vectors a from top-down, matrix operation implementation is reused to the bottom-up conspicuousness of Projection Character
The weighting of weighted sum top-down attention, the response for finally merging the two again obtain final result;Assuming that the feature to pond is X
And X ∈ Rn×f, a, b ∈ Rf×1, wherein n is the bulk to pond Projection Character, and f is the port number to pond Projection Character
Amount,Represent feature X and carried out the perspective view after the significance weighted from bottom-up, the perspective view with
Specific classification is unrelated, represents the perspective view after feature X has carried out the attention weighting from top-down with a thickness of 1, Xa,
Since different classifications should have different attention weight vectors a, enabling categorical measure is K, then the attention of all categories
Weight matrix is A ∈ Rf×K, top-down attention perspective view formula isFinal every a piece of particular category
Attention projects Xa and first carries out being multiplied by element with conspicuousness projection Xb, then sums to multiplied result, obtains the category
The eigenmatrix of attention weighting.
Further, data aggregation layer fusion classifies to input video movement by softmax activation primitive, and
By minimizing Classification Loss function, the parameter of Lai Youhua network.The training of the network model is not only restricted to specifically train frame
Caffe frame, MxNet frame, Torch frame and Tesorflow frame etc. can be used in frame.
The present invention has the following advantages and effects with respect to the prior art:
(1) a kind of multi-modal action identification method based on deep neural network disclosed by the invention has been used with more
The deep neural network of branched structure, wherein image branch is for handling the spatial informations such as shape, the color of object;Light stream branch
For handle to move, the relevant information in position;Skeleton branch is reached by the path integral feature of processing frame sequence to dynamic
The meticulous depiction of work.
(2) a kind of multi-modal action identification method based on deep neural network disclosed by the invention, first by locating in advance
Reason reduces the calculation amount of network, to be substantially reduced operation time, and can comprehensively utilize video image, light stream figure and human body
The multi-modal informations such as skeleton significantly improve video actions accuracy of identification.
(3) a kind of multi-modal action identification method based on deep neural network disclosed by the invention, in image branch
Pond layer operation introduces a kind of attention weighting pondization operation, it can voluntarily each pond unit be arrived in study in training
Weight, the bigger pond unit of weight corresponds to the abstract characteristics closely bound up with the movement, and the Chi Huadan that weight is smaller
Member corresponds to other features that should ignore or can generate interference to action recognition.By based on attention mechanism
After pooling structure, the feature unrelated with action classification will be ignored, and will be by with closely bound up feature is acted
" amplification " improves the accuracy rate and precision of action recognition.
Detailed description of the invention
Fig. 1 is a kind of multi-modal action identification method model signal based on deep neural network disclosed in the present invention
Figure;
Fig. 2 is the pooling Structure Calculation schematic diagram proposed by the present invention based on attention mechanism;
Fig. 3 is a kind of process of the multi-modal action identification method method based on deep neural network disclosed in the present invention
Figure.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Embodiment
As shown in Figure 1, present embodiment discloses a kind of multi-modal action identification method based on deep neural network.
There are three branches altogether in low layer for the deep neural network that the present embodiment uses --- and one is used for extraction time feature
Convolutional neural networks, one for extracting the convolutional neural networks of space characteristics, one is special for handling skeleton path integral
The fully-connected network of sign.In high level, three branches merge into a branch by Fusion Features, and activate letter by softmax
The classification id of number prediction video actions.In image branch, a kind of pooling structure based on attention mechanism is introduced, it can
On the basis of not changing existing network infrastructure, network structure is helped to focus on the feature for being conducive to identification maneuver, to reduce nothing
The interference for closing feature, improves the performance of existing network, and video human motion recognition system is enabled more effectively to be applied to engineering.
As the embodiment of the present invention, the complete training precision that can be improved model of training data, in addition to data into
Row pretreatment and compression, can be further reduced the interference of redundancy and irrelevant information, reduce the calculation amount of model, thus
It reduces the model training time and improves training precision, therefore as the embodiment of the present invention, the multimode based on deep neural network
State action identification method is as follows:
The collection of S1, training data
It mainly include following data library: KTH human body behavior database, UCF Sports by acquiring disclosed database
Every frame image of video data is converted to RGB picture set by database, and naming rule is made according to video name+time+movement id
Data separation is carried out for filename, data are gathered according to the ratio construction training set of 3:1 and test here, wherein movement id packet
Containing six kinds of basic movements: walking, run, wave, bend over, jump, stand.
S2, the data acquisition system obtained to step S1 normalize, i.e. unified resolution size, i.e., to the picture specification of every frame into
It, is uniformly arrived the resolution ratio of 120*90 by row compression, on the basis of guaranteeing the information integrity of image as far as possible, reduces convolution mind
Calculation amount through network model improves recognition speed.
S3, data compression is carried out to step S2 treated sets of image data, reduces calculation amount, i.e., using image from
It dissipates cosine transform DCT to compress the Pixel Information of every frame video pictures, compression ratio 10:1 can reduce initialization with this
Information content when processing.Discrete cosine transform is carried out to original image, transformed DCT coefficient is subjected to thresholding operation, it will
Coefficient less than threshold value is zeroed, compression quantization image process, then carries out inverse DCT operation, available final compressed figure
Picture.
S4, to step S3 treated video data according to time dimension, by video frame of the time interval in 500ms or
Video frame deletion of person's picture similarity 0.7 or more reduces redundancy.
Wherein, the method and step for calculating picture similarity is as follows:
S41, scaling pictures: being general size 8*8,64 pixel values by picture compression;
S42, simplify color, be converted into grayscale image;
S43, it calculates average value: calculating the average value of the pixel value of grayscale image all pixels point;
S44, compared pixels gray value: the average value that each pixel value and previous step for traversing grayscale image calculate is greater than
Average value is recorded as 1, is otherwise 0;
S45,64 bit image fingerprints are obtained;
S46, calculate two pictures finger image Hamming distance, using the Hamming distance as picture similarity.
S5, optical flow method mainly utilize in image sequence pixel in the variation in time-domain and the correlation between consecutive frame
Property finds previous frame with corresponding relationship existing between present frame, to calculate the motion information of object between consecutive frame.
Lucas-Kanade method is a kind of widely used light stream estimation difference method, is owned using least square principle in neighborhood
Pixel solves basic optical flow equation, compares common point by point method, and Lucas-Kanade algorithm is more insensitive for picture noise.
Therefore the bi-directional light of 10 successive video frames is extracted using Lucas-Kanade algorithm to step S4 treated video requency frame data
Stream information.Wherein, Lucas-Kanade algorithm is Lucas B and Kanade T.An Iterative Image
Registration Technique with an Application to Stereo Vision.Proc.Of 7th
International Joint Conference on Artificial Intelligence (IJCAI), pp.674-679 opinion
The method that text is mentioned, has been carried out in openCV, therefore it is using the Lucas- on openCV that Optic flow information is extracted in this realization
Kanade extracts Optic flow information.
S6, path integral feature passage path iterated integral, being capable of extraction paths come information such as displacement, the curvature of portraying path
Multidate information abundant.By using Attitude estimation algorithm, such as AlphaPose etc., to step S4 treated video data
Human skeleton is extracted by frame, obtains a skeleton time series.Note video frame number is N, and key point number is K (value 15), often
A key point there are two coordinate, then frame sequence be a dimension be 2K, the path that length is N.It can be remembered for Pd={ X1,
X2,...,XN, wherein XiFor 2K dimensional vector.Path PdIt is to practical continuous type key point path P for a discreet pathst:[0,T]
→RdSampling.For Pt, kth rank accumulated path, which divides, to be defined as follows:
Path integral feature is then the set of all rank accumulated paths point, is an infinite dimensional vector.0th rank path is tired
Integral is defined as 1.In general, the iterated integral of preceding m rank portrays the behavioral characteristics in path enough in engineering practice, then
Take its preceding m rank path integral feature as follows:
S(X)|m={ 1, I1,I2,...,Im}
In practice, without Pt, only Pd, path integral can be calculated by tensor algebra at this time.
In the data that above-mentioned steps have been constructed by downloading and pretreatment, the ratio cut partition according to 3:1 is training dataset
It closes with after test set, the neural network model of action recognition is constructed using following methods.
S7, by Optic flow information that step S5 is extracted, step S6 the human skeleton path integral feature extracted and
Step S4 treated several video images, are input in deep neural network.There are three branches in low layer tool for the network, that is, scheme
As branch, light stream branch and skeleton branch, the input of three mode is corresponded respectively to;Pass through Fusion Features in three branches of high level
Merge into a branch.
Image branch uses convolutional neural networks, is sequentially connected from input layer to output layer are as follows: convolutional layer conv1, pond
Change layer pooling1, convolutional layer conv2, pond layer pooling2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5,
Pond layer attention pooling, full articulamentum fc6, full articulamentum fc7, full articulamentum fc8, data aggregation layer fusion,
Loss function layer loss;
Light stream branch uses convolutional neural networks, is sequentially connected from input layer to output layer are as follows: convolutional layer conv1, pond
Change layer pooling1, convolutional layer conv2, pond layer pooling2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5,
Pond layer pooling5, full articulamentum fc6, full articulamentum fc7, full articulamentum fc8, data aggregation layer fusion, loss function
Layer loss;
Skeleton branch uses fully-connected network, and full articulamentum fc1, Quan Lian stratum is followed successively by from input layer to output layer
Fc2, data aggregation layer fusion, loss function layer loss.
For overall network structure as shown in Figure 1, in multiple-limb neural network, image branch can capture the space in video
It relies on, light stream branch can capture the presence of the cycling service of each spatial position in video, skeleton branch meticulous depiction
The change in time and space of human body key point position when movement.Three branches pass through respective feature learning network respectively, in data fusion
Layer fusion is merged, to obtain finally abstract space-time characteristic relevant to action recognition.Fusion layers of feature are passed through
Softmax activation, to predict action classification.
In image branch, attention pool is the pooling Network Computing Architecture based on attention mechanism, is such as schemed
Shown in 2, by two groups of construction of the parameter input weight vectors for study after convolution, respectively from the significant of bottom-up
The property weight vectors b and attention weight vectors a from top-down, reuses matrix operation and implements to Projection Character
Bottom-up significance weighted and the weighting of top-down attention, the response for finally merging the two again obtain final result.Assuming that
Feature to pond is X and X ∈ Rn×f, a, b ∈ Rf×1, wherein n is the bulk to pond Projection Character, and f is to Chi Huate
The number of channels of projection is levied,It represents feature X and has carried out the projection after the significance weighted from bottom-up
Figure, the perspective view is unrelated with specific classification, with a thickness of 1, has carried out as shown in Fig. 2, Xa represents feature X from top-down
Attention weighting after perspective view enable categorical measure since different classifications should have different attention weight vectors a
For K, then the attention weight matrix of all categories is A ∈ Rf×K, top-down attention perspective view formula isAs shown in Figure 2, the final attention projection Xa per a piece of particular category is first carried out with conspicuousness projection Xb
It sums by the multiplication of element, then to multiplied result, obtains the eigenmatrix of category attention weighting.
Multi-modal action identification method in the present invention based on deep neural network, passes through multi-modal fusion and attention
Mechanism can capture space-time characteristic relevant to action recognition, to improve the accuracy of identification of network.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention,
It should be equivalent substitute mode, be included within the scope of the present invention.
Claims (10)
1. a kind of multi-modal action identification method based on deep neural network, which is characterized in that the action identification method
Include:
The collection of S1, training data acquire disclosed database, the set of construction training according to a certain percentage and test set, and
Every frame image of video data is converted into RGB picture set;
S2, the data acquisition system unified resolution obtained to step S1 carry out the picture specification of every frame image of video data
Compression;
S3, data compression is carried out to step S2 treated sets of image data;
S4, to step S3 treated video data according to time dimension, by video frame of the time interval in interval threshold or
Video frame deletion of person's picture similarity more than similar threshold value;
S5, to the two-way Optic flow information of step S4 treated data extract N number of successive video frames, wherein N is more than or equal to 10
Positive integer;
S6, human skeleton is extracted by frame to step S4 treated data, and calculates the path integral feature of frame sequence;
S7, the human skeleton path integral feature and step for extracting Optic flow information that step S5 is extracted, step S6
S4 treated several video images, the input as deep neural network, wherein the deep neural network has in low layer
There are three branches, correspond respectively to the input of three mode, three branches of low layer are merged into one by Fusion Features in high level
A branch, and the classification id that institute's input video acts is predicted by softmax activation primitive.
2. a kind of multi-modal action identification method based on deep neural network according to claim 1, which is characterized in that
The naming rule of every frame image carries out data separation as filename according to video name+time+movement id in the step S1,
Wherein, it classification id of the movement id as video actions, including following elemental motion: walks, run, wave, bend over, jump, stand.
3. a kind of multi-modal action identification method based on deep neural network according to claim 1, which is characterized in that
Data compression, process are carried out using Pixel Information of the discrete cosine transform of image to every frame video pictures in the step S3
It is as follows:
Discrete cosine transform is carried out to every frame image of video data, transformed DCT coefficient is subjected to thresholding operation, it will
Coefficient less than certain threshold value is zeroed, and then carries out inverse DCT operation, obtains the single-frame images of compressed video data.
4. a kind of multi-modal action identification method based on deep neural network according to claim 1, which is characterized in that
The video frame deletion of video frame or picture similarity 0.7 or more in the step S4 by time interval in 500ms.
5. a kind of multi-modal action identification method based on deep neural network according to claim 1, which is characterized in that
Steps are as follows for the calculating of the picture similarity:
Picture compression is certain proportion size W*W, wherein W is pixel quantity by S41, scaling pictures;
S42, simplify color, be converted into grayscale image;
S43, average value is calculated, calculates the average value of the pixel value of grayscale image all pixels point;
S44, compared pixels gray value traverse each pixel value and above-mentioned average value of grayscale image, are greater than average value and record
It is 1, is otherwise 0;
S45, W is obtained2Bit image fingerprint;
S46, calculate two pictures finger image Hamming distance, using the Hamming distance as picture similarity.
6. a kind of multi-modal action identification method based on deep neural network according to claim 1, which is characterized in that
All pixels in neighborhood are solved using Lucas-Kanade algorithm in the step S5 and using least square principle basic
Optical flow equation finally extracts the two-way Optic flow information of N number of successive video frames.
7. a kind of multi-modal action identification method based on deep neural network according to claim 1, which is characterized in that
The step S6 extracts human skeleton by frame using open source Attitude estimation algorithm, to video, so that a frame sequence is obtained,
And path integral feature is sought to the frame sequence.
8. a kind of multi-modal action identification method based on deep neural network according to claim 1, which is characterized in that
The network structure of deep neural network is as follows in the step S7:
There are three branches, i.e. image branch, light stream branch and skeleton branch in low layer tool for the deep neural network, correspond respectively to
The input of three mode;Three branches of low layer are merged into a branch by Fusion Features in high level, wherein image branch
It using convolutional neural networks, is sequentially connected from input layer to output layer are as follows: convolutional layer conv1, pond layer pooling1, convolutional layer
Conv2, pond layer pooling2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5, pond layer attention
Pooling, full articulamentum fc6, full articulamentum fc7, full articulamentum fc8, data aggregation layer fusion, loss function layer loss;
Light stream branch uses convolutional neural networks, is sequentially connected from input layer to output layer are as follows: convolutional layer conv1, pond layer
Pooling1, convolutional layer conv2, pond layer pooling2, convolutional layer conv3, convolutional layer conv4, convolutional layer conv5, Chi Hua
Layer pooling5, full articulamentum fc6, full articulamentum fc7, full articulamentum fc8, data aggregation layer fusion, loss function layer
loss;
Skeleton branch uses fully-connected network, and full articulamentum fc1, Quan Lian stratum fc2, data are followed successively by from input layer to output layer
Fused layer fusion, loss function layer loss.
9. a kind of multi-modal action identification method based on deep neural network according to claim 8, which is characterized in that
The data aggregation layer fusion classifies to input video movement by softmax activation primitive, and passes through minimum
Classification Loss function optimizes the parameter of network.
10. a kind of multi-modal action identification method based on deep neural network according to claim 8, feature exist
In pool layers of introducing attention mechanism of the attention of image branch in the step S7 input the parameter after convolution
It constructs two groups of weight vectors for study, the respectively conspicuousness weight vectors b from bottom-up and comes from top-down
Attention weight vectors a, reuse matrix operation and implement to the bottom-up significance weighted and top-down of Projection Character
Attention weighting, the response for finally merging the two again obtain final result;Assuming that the feature to pond is X and X ∈ Rn×f, a, b ∈
Rf×1, wherein n is the bulk to pond Projection Character, and f is the number of channels to pond Projection Character,
It representing feature X and has carried out the perspective view after the significance weighted from bottom-up, the perspective view is unrelated with specific classification,
It represents the perspective view after feature X has carried out the attention weighting from top-down with a thickness of 1, Xa, due to different classifications
There should be different attention weight vectors a, enabling categorical measure is K, then the attention weight matrix of all categories is A ∈ Rf ×K, top-down attention perspective view formula isThe final attention projection Xa per a piece of particular category first and
Conspicuousness projects Xb progress by the multiplication of element, then sums to multiplied result, obtains the feature of category attention weighting
Matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811165862.4A CN109460707A (en) | 2018-10-08 | 2018-10-08 | A kind of multi-modal action identification method based on deep neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811165862.4A CN109460707A (en) | 2018-10-08 | 2018-10-08 | A kind of multi-modal action identification method based on deep neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109460707A true CN109460707A (en) | 2019-03-12 |
Family
ID=65607315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811165862.4A Pending CN109460707A (en) | 2018-10-08 | 2018-10-08 | A kind of multi-modal action identification method based on deep neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109460707A (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583334A (en) * | 2018-11-16 | 2019-04-05 | 中山大学 | A kind of action identification method and its system based on space time correlation neural network |
CN109948528A (en) * | 2019-03-18 | 2019-06-28 | 南京砺剑光电技术研究院有限公司 | A kind of robot behavior recognition methods based on visual classification |
CN110059620A (en) * | 2019-04-17 | 2019-07-26 | 安徽艾睿思智能科技有限公司 | Bone Activity recognition method based on space-time attention |
CN110084228A (en) * | 2019-06-25 | 2019-08-02 | 江苏德劭信息科技有限公司 | A kind of hazardous act automatic identifying method based on double-current convolutional neural networks |
CN110096968A (en) * | 2019-04-10 | 2019-08-06 | 西安电子科技大学 | A kind of ultrahigh speed static gesture identification method based on depth model optimization |
CN110135304A (en) * | 2019-04-30 | 2019-08-16 | 北京地平线机器人技术研发有限公司 | Human body method for recognizing position and attitude and device |
CN110135386A (en) * | 2019-05-24 | 2019-08-16 | 长沙学院 | A kind of human motion recognition method and system based on deep learning |
CN110175266A (en) * | 2019-05-28 | 2019-08-27 | 复旦大学 | A method of it is retrieved for multistage video cross-module state |
CN110197116A (en) * | 2019-04-15 | 2019-09-03 | 深圳大学 | A kind of Human bodys' response method, apparatus and computer readable storage medium |
CN110232412A (en) * | 2019-05-30 | 2019-09-13 | 清华大学 | A kind of body gait prediction technique based on multi-modal deep learning |
CN110263666A (en) * | 2019-05-29 | 2019-09-20 | 西安交通大学 | A kind of motion detection method based on asymmetric multithread |
CN110298332A (en) * | 2019-07-05 | 2019-10-01 | 海南大学 | Method, system, computer equipment and the storage medium of Activity recognition |
CN110398369A (en) * | 2019-08-15 | 2019-11-01 | 贵州大学 | A kind of Fault Diagnosis of Roller Bearings merged based on 1-DCNN and LSTM |
CN110458038A (en) * | 2019-07-19 | 2019-11-15 | 天津理工大学 | The cross-domain action identification method of small data based on double-strand depth binary-flow network |
CN110472532A (en) * | 2019-07-30 | 2019-11-19 | 中国科学院深圳先进技术研究院 | A kind of the video object Activity recognition method and apparatus |
CN110491479A (en) * | 2019-07-16 | 2019-11-22 | 北京邮电大学 | A kind of construction method of sclerotin status assessment model neural network based |
CN110516595A (en) * | 2019-08-27 | 2019-11-29 | 中国民航大学 | Finger multi-modal fusion recognition methods based on convolutional neural networks |
CN111027472A (en) * | 2019-12-09 | 2020-04-17 | 北京邮电大学 | Video identification method based on fusion of video optical flow and image space feature weight |
CN111046227A (en) * | 2019-11-29 | 2020-04-21 | 腾讯科技(深圳)有限公司 | Video duplicate checking method and device |
CN111274998A (en) * | 2020-02-17 | 2020-06-12 | 上海交通大学 | Parkinson's disease finger knocking action identification method and system, storage medium and terminal |
CN111310707A (en) * | 2020-02-28 | 2020-06-19 | 山东大学 | Skeleton-based method and system for recognizing attention network actions |
CN111695523A (en) * | 2020-06-15 | 2020-09-22 | 浙江理工大学 | Double-current convolutional neural network action identification method based on skeleton space-time and dynamic information |
CN111754620A (en) * | 2020-06-29 | 2020-10-09 | 武汉市东旅科技有限公司 | Human body space motion conversion method, conversion device, electronic equipment and storage medium |
CN111931602A (en) * | 2020-07-22 | 2020-11-13 | 北方工业大学 | Multi-stream segmented network human body action identification method and system based on attention mechanism |
CN112131908A (en) * | 2019-06-24 | 2020-12-25 | 北京眼神智能科技有限公司 | Action identification method and device based on double-flow network, storage medium and equipment |
CN112183240A (en) * | 2020-09-11 | 2021-01-05 | 山东大学 | Double-current convolution behavior identification method based on 3D time stream and parallel space stream |
CN112396018A (en) * | 2020-11-27 | 2021-02-23 | 广东工业大学 | Badminton player foul action recognition method combining multi-modal feature analysis and neural network |
WO2021035807A1 (en) * | 2019-08-23 | 2021-03-04 | 深圳大学 | Target tracking method and device fusing optical flow information and siamese framework |
CN112686193A (en) * | 2021-01-06 | 2021-04-20 | 东北大学 | Action recognition method and device based on compressed video and computer equipment |
CN113033430A (en) * | 2021-03-30 | 2021-06-25 | 中山大学 | Bilinear-based artificial intelligence method, system and medium for multi-modal information processing |
CN113065451A (en) * | 2021-03-29 | 2021-07-02 | 四川翼飞视科技有限公司 | Multi-mode fused action recognition device and method and storage medium |
CN113761975A (en) * | 2020-06-04 | 2021-12-07 | 南京大学 | Human skeleton action recognition method based on multi-mode feature fusion |
CN113902995A (en) * | 2021-11-10 | 2022-01-07 | 中国科学技术大学 | Multi-mode human behavior recognition method and related equipment |
CN114611584A (en) * | 2022-02-21 | 2022-06-10 | 上海市胸科医院 | CP-EBUS elastic mode video processing method, device, equipment and medium |
CN114821206A (en) * | 2022-06-30 | 2022-07-29 | 山东建筑大学 | Multi-modal image fusion classification method and system based on confrontation complementary features |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014155215A1 (en) * | 2013-03-29 | 2014-10-02 | Università Degli Studi Dell'aquila | Method and apparatus for monitoring the personal exposure to static or quasi- static magnetic fields |
CN104156693A (en) * | 2014-07-15 | 2014-11-19 | 天津大学 | Motion recognition method based on multi-model sequence fusion |
US20160321498A1 (en) * | 2007-01-12 | 2016-11-03 | International Business Machines Corporation | Warning a user about adverse behaviors of others within an environment based on a 3d captured image stream |
CN108288035A (en) * | 2018-01-11 | 2018-07-17 | 华南理工大学 | The human motion recognition method of multichannel image Fusion Features based on deep learning |
-
2018
- 2018-10-08 CN CN201811165862.4A patent/CN109460707A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160321498A1 (en) * | 2007-01-12 | 2016-11-03 | International Business Machines Corporation | Warning a user about adverse behaviors of others within an environment based on a 3d captured image stream |
WO2014155215A1 (en) * | 2013-03-29 | 2014-10-02 | Università Degli Studi Dell'aquila | Method and apparatus for monitoring the personal exposure to static or quasi- static magnetic fields |
CN104156693A (en) * | 2014-07-15 | 2014-11-19 | 天津大学 | Motion recognition method based on multi-model sequence fusion |
CN108288035A (en) * | 2018-01-11 | 2018-07-17 | 华南理工大学 | The human motion recognition method of multichannel image Fusion Features based on deep learning |
Non-Patent Citations (2)
Title |
---|
CHUNHUI LIU等: "PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding", 《HTTPS://ARXIV.ORG/ABS/1703.07475》 * |
ROHIT GIRDHAR等: "Attentional Pooling for Action Recognition", 《HTTPS://ARXIV.ORG/PDF/1711.01467V3.PDF》 * |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583334A (en) * | 2018-11-16 | 2019-04-05 | 中山大学 | A kind of action identification method and its system based on space time correlation neural network |
CN109948528B (en) * | 2019-03-18 | 2023-04-07 | 南京砺剑光电技术研究院有限公司 | Robot behavior identification method based on video classification |
CN109948528A (en) * | 2019-03-18 | 2019-06-28 | 南京砺剑光电技术研究院有限公司 | A kind of robot behavior recognition methods based on visual classification |
CN110096968A (en) * | 2019-04-10 | 2019-08-06 | 西安电子科技大学 | A kind of ultrahigh speed static gesture identification method based on depth model optimization |
CN110096968B (en) * | 2019-04-10 | 2023-02-07 | 西安电子科技大学 | Ultra-high-speed static gesture recognition method based on depth model optimization |
CN110197116A (en) * | 2019-04-15 | 2019-09-03 | 深圳大学 | A kind of Human bodys' response method, apparatus and computer readable storage medium |
CN110059620A (en) * | 2019-04-17 | 2019-07-26 | 安徽艾睿思智能科技有限公司 | Bone Activity recognition method based on space-time attention |
CN110059620B (en) * | 2019-04-17 | 2021-09-03 | 安徽艾睿思智能科技有限公司 | Skeletal behavior identification method based on space-time attention |
CN110135304A (en) * | 2019-04-30 | 2019-08-16 | 北京地平线机器人技术研发有限公司 | Human body method for recognizing position and attitude and device |
CN110135386A (en) * | 2019-05-24 | 2019-08-16 | 长沙学院 | A kind of human motion recognition method and system based on deep learning |
CN110175266B (en) * | 2019-05-28 | 2020-10-30 | 复旦大学 | Cross-modal retrieval method for multi-segment video |
CN110175266A (en) * | 2019-05-28 | 2019-08-27 | 复旦大学 | A method of it is retrieved for multistage video cross-module state |
CN110263666A (en) * | 2019-05-29 | 2019-09-20 | 西安交通大学 | A kind of motion detection method based on asymmetric multithread |
CN110263666B (en) * | 2019-05-29 | 2021-01-19 | 西安交通大学 | Action detection method based on asymmetric multi-stream |
CN110232412A (en) * | 2019-05-30 | 2019-09-13 | 清华大学 | A kind of body gait prediction technique based on multi-modal deep learning |
CN112131908A (en) * | 2019-06-24 | 2020-12-25 | 北京眼神智能科技有限公司 | Action identification method and device based on double-flow network, storage medium and equipment |
CN112131908B (en) * | 2019-06-24 | 2024-06-11 | 北京眼神智能科技有限公司 | Action recognition method, device, storage medium and equipment based on double-flow network |
CN110084228A (en) * | 2019-06-25 | 2019-08-02 | 江苏德劭信息科技有限公司 | A kind of hazardous act automatic identifying method based on double-current convolutional neural networks |
CN110298332A (en) * | 2019-07-05 | 2019-10-01 | 海南大学 | Method, system, computer equipment and the storage medium of Activity recognition |
CN110491479A (en) * | 2019-07-16 | 2019-11-22 | 北京邮电大学 | A kind of construction method of sclerotin status assessment model neural network based |
CN110458038A (en) * | 2019-07-19 | 2019-11-15 | 天津理工大学 | The cross-domain action identification method of small data based on double-strand depth binary-flow network |
CN110472532B (en) * | 2019-07-30 | 2022-02-25 | 中国科学院深圳先进技术研究院 | Video object behavior identification method and device |
CN110472532A (en) * | 2019-07-30 | 2019-11-19 | 中国科学院深圳先进技术研究院 | A kind of the video object Activity recognition method and apparatus |
CN110398369A (en) * | 2019-08-15 | 2019-11-01 | 贵州大学 | A kind of Fault Diagnosis of Roller Bearings merged based on 1-DCNN and LSTM |
WO2021035807A1 (en) * | 2019-08-23 | 2021-03-04 | 深圳大学 | Target tracking method and device fusing optical flow information and siamese framework |
CN110516595A (en) * | 2019-08-27 | 2019-11-29 | 中国民航大学 | Finger multi-modal fusion recognition methods based on convolutional neural networks |
CN110516595B (en) * | 2019-08-27 | 2023-04-07 | 中国民航大学 | Finger multi-mode feature fusion recognition method based on convolutional neural network |
CN111046227B (en) * | 2019-11-29 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Video duplicate checking method and device |
CN111046227A (en) * | 2019-11-29 | 2020-04-21 | 腾讯科技(深圳)有限公司 | Video duplicate checking method and device |
CN111027472A (en) * | 2019-12-09 | 2020-04-17 | 北京邮电大学 | Video identification method based on fusion of video optical flow and image space feature weight |
CN111274998A (en) * | 2020-02-17 | 2020-06-12 | 上海交通大学 | Parkinson's disease finger knocking action identification method and system, storage medium and terminal |
CN111274998B (en) * | 2020-02-17 | 2023-04-28 | 上海交通大学 | Parkinson's disease finger knocking action recognition method and system, storage medium and terminal |
CN111310707B (en) * | 2020-02-28 | 2023-06-20 | 山东大学 | Bone-based graph annotation meaning network action recognition method and system |
CN111310707A (en) * | 2020-02-28 | 2020-06-19 | 山东大学 | Skeleton-based method and system for recognizing attention network actions |
CN113761975B (en) * | 2020-06-04 | 2023-12-15 | 南京大学 | Human skeleton action recognition method based on multi-mode feature fusion |
CN113761975A (en) * | 2020-06-04 | 2021-12-07 | 南京大学 | Human skeleton action recognition method based on multi-mode feature fusion |
CN111695523B (en) * | 2020-06-15 | 2023-09-26 | 浙江理工大学 | Double-flow convolutional neural network action recognition method based on skeleton space-time and dynamic information |
CN111695523A (en) * | 2020-06-15 | 2020-09-22 | 浙江理工大学 | Double-current convolutional neural network action identification method based on skeleton space-time and dynamic information |
CN111754620B (en) * | 2020-06-29 | 2024-04-26 | 武汉市东旅科技有限公司 | Human body space motion conversion method, conversion device, electronic equipment and storage medium |
CN111754620A (en) * | 2020-06-29 | 2020-10-09 | 武汉市东旅科技有限公司 | Human body space motion conversion method, conversion device, electronic equipment and storage medium |
CN111931602B (en) * | 2020-07-22 | 2023-08-08 | 北方工业大学 | Attention mechanism-based multi-flow segmented network human body action recognition method and system |
CN111931602A (en) * | 2020-07-22 | 2020-11-13 | 北方工业大学 | Multi-stream segmented network human body action identification method and system based on attention mechanism |
CN112183240A (en) * | 2020-09-11 | 2021-01-05 | 山东大学 | Double-current convolution behavior identification method based on 3D time stream and parallel space stream |
CN112183240B (en) * | 2020-09-11 | 2022-07-22 | 山东大学 | Double-current convolution behavior identification method based on 3D time stream and parallel space stream |
CN112396018B (en) * | 2020-11-27 | 2023-06-06 | 广东工业大学 | Badminton player foul action recognition method combining multi-mode feature analysis and neural network |
CN112396018A (en) * | 2020-11-27 | 2021-02-23 | 广东工业大学 | Badminton player foul action recognition method combining multi-modal feature analysis and neural network |
CN112686193B (en) * | 2021-01-06 | 2024-02-06 | 东北大学 | Action recognition method and device based on compressed video and computer equipment |
CN112686193A (en) * | 2021-01-06 | 2021-04-20 | 东北大学 | Action recognition method and device based on compressed video and computer equipment |
CN113065451A (en) * | 2021-03-29 | 2021-07-02 | 四川翼飞视科技有限公司 | Multi-mode fused action recognition device and method and storage medium |
CN113065451B (en) * | 2021-03-29 | 2022-08-09 | 四川翼飞视科技有限公司 | Multi-mode fused action recognition device and method and storage medium |
CN113033430A (en) * | 2021-03-30 | 2021-06-25 | 中山大学 | Bilinear-based artificial intelligence method, system and medium for multi-modal information processing |
CN113033430B (en) * | 2021-03-30 | 2023-10-03 | 中山大学 | Artificial intelligence method, system and medium for multi-mode information processing based on bilinear |
CN113902995A (en) * | 2021-11-10 | 2022-01-07 | 中国科学技术大学 | Multi-mode human behavior recognition method and related equipment |
CN113902995B (en) * | 2021-11-10 | 2024-04-02 | 中国科学技术大学 | Multi-mode human behavior recognition method and related equipment |
CN114611584A (en) * | 2022-02-21 | 2022-06-10 | 上海市胸科医院 | CP-EBUS elastic mode video processing method, device, equipment and medium |
CN114821206B (en) * | 2022-06-30 | 2022-09-13 | 山东建筑大学 | Multi-modal image fusion classification method and system based on confrontation complementary features |
CN114821206A (en) * | 2022-06-30 | 2022-07-29 | 山东建筑大学 | Multi-modal image fusion classification method and system based on confrontation complementary features |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109460707A (en) | A kind of multi-modal action identification method based on deep neural network | |
Fan et al. | Point spatio-temporal transformer networks for point cloud video modeling | |
Fan et al. | Deep hierarchical representation of point cloud videos via spatio-temporal decomposition | |
CN110516620A (en) | Method for tracking target, device, storage medium and electronic equipment | |
CN112530019B (en) | Three-dimensional human body reconstruction method and device, computer equipment and storage medium | |
CN114220176A (en) | Human behavior recognition method based on deep learning | |
CN112288627B (en) | Recognition-oriented low-resolution face image super-resolution method | |
CN112446342B (en) | Key frame recognition model training method, recognition method and device | |
CN111985343A (en) | Method for constructing behavior recognition deep network model and behavior recognition method | |
CN110390294B (en) | Target tracking method based on bidirectional long-short term memory neural network | |
CN111444370A (en) | Image retrieval method, device, equipment and storage medium thereof | |
CN113111842A (en) | Action recognition method, device, equipment and computer readable storage medium | |
CN116311525A (en) | Video behavior recognition method based on cross-modal fusion | |
CN113920581A (en) | Method for recognizing motion in video by using space-time convolution attention network | |
CN114419732A (en) | HRNet human body posture identification method based on attention mechanism optimization | |
CN110942037A (en) | Action recognition method for video analysis | |
CN114333002A (en) | Micro-expression recognition method based on deep learning of image and three-dimensional reconstruction of human face | |
Cha et al. | Learning 3D skeletal representation from transformer for action recognition | |
CN117893957A (en) | System and method for flow counting | |
Wang et al. | Lightweight channel-topology based adaptive graph convolutional network for skeleton-based action recognition | |
CN115359550A (en) | Gait emotion recognition method and device based on Transformer, electronic device and storage medium | |
CN111626212B (en) | Method and device for identifying object in picture, storage medium and electronic device | |
CN112001313A (en) | Image identification method and device based on attribution key points | |
CN117809109A (en) | Behavior recognition method based on multi-scale time features | |
CN113822117B (en) | Data processing method, device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190312 |