CN112861758A - Behavior identification method based on weak supervised learning video segmentation - Google Patents
Behavior identification method based on weak supervised learning video segmentation Download PDFInfo
- Publication number
- CN112861758A CN112861758A CN202110207458.4A CN202110207458A CN112861758A CN 112861758 A CN112861758 A CN 112861758A CN 202110207458 A CN202110207458 A CN 202110207458A CN 112861758 A CN112861758 A CN 112861758A
- Authority
- CN
- China
- Prior art keywords
- video
- frame
- length
- segment
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a behavior identification method based on weak supervised learning video segmentation, wherein the method comprises the following steps: dividing the whole video into N sections with unknown quantity, distributing class labels and length labels for each section, and generating frame labels for calculating frame-by-frame cross entropy loss by using a Viterbi algorithm for video sections; finding the best action segmentation point in the initial video segmentation obtained by the Viterbi algorithm, and decomposing the initial video segmentation to obtain a visual model, a length model and a context model; connecting input data sequences in forward propagation by using a single-layer GRU network with 256 circulating gate units and softmax output to obtain a posterior probability and a length model; defining an auxiliary function and finding out an optimal segmentation point; finally, the maximum possible segmentation of the complete video is obtained by the length model and the auxiliary function. And the weak surveillance video is fully utilized to segment the actions in the complete video.
Description
Technical Field
The invention relates to the field of video behavior identification, in particular to a behavior identification method based on weak supervised learning video segmentation.
Background
In recent years, the generation of a large amount of video data attracts the research on the identification of video behaviors. The overall development trend in the field of behavior recognition is also converted from static scenes to dynamic scenes, the detection and recognition of single-motion targets are converted into the detection and analysis of multi-motion targets, and the transition from individual simple behaviors to complex actions is even converted into group behavior and action recognition and detection. The Breakfast, Salad and other video behavior data sets are discussed continuously on the computer vision top-level conference paper, and the classification and time segmentation of activities in the data sets have become popular contents of video behavior recognition research.
Video behavior recognition mainly depends on two labeling modes of a data set, namely full supervision and weak supervision. The full supervision consumes a large amount of labor cost to delimit and mark action frames and classes in the video, the weak supervision only provides a sequence of action classes in a section of the video, but does not provide a specific starting frame and ending frame of each action, and temporal action segmentation and marking are learned from action sets formed by action labels. The traditional video behavior recognition algorithms include dynamic time warping, CDP algorithm, HMM and Viterbi algorithm, and the basic deep learning video behavior recognition methods include a dual-stream method, LSTM, GRU, C3D and I3D, etc., which have good effects in detecting motion categories in videos but are not efficient for weak surveillance video segmentation.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a behavior recognition method based on weak supervised learning video segmentation, which utilizes a frame label generated by a Viterbi algorithm to calculate the cross entropy loss L frame by frame, uses the random gradient descent of the gradient Delta L of the cross entropy loss to update network parameters, uses a single-layer GRU network with 256 cycle gate units and softmax output to connect an input data sequence in forward propagation, and calculates the posterior probability to obtain the maximum segmentation number of the complete video and accurately segment different actions in a section of video.
The technical scheme adopted by the invention is as follows:
the method comprises the following steps that (1) the whole video is divided into N sections, a class label c and a length label L are distributed to each section, and a frame label is generated for the video sections obtained through division by using a Viterbi algorithm and is used for calculating the cross entropy loss L frame by frame; based on the cross entropy loss L of all video frames, updating GRU network parameters by using the random gradient descent of the gradient delta L;
step (2), obtaining initial video segmentation by Viterbi algorithm in step (1)Finding the optimal action division point, wherein i is the video segment number, i is 1Decomposing to obtain a visual model, a length model and a context model;
step (3), connecting input data sequences in forward propagation by using a single-layer GRU network with 256 cyclic gate units and softmax output to obtain a visual model in the step (2) by dividing posterior probability by class probability, and obtaining a length model in the step (2) by using class-based Poisson distribution;
step (4), defining an auxiliary function and finding out an optimal segmentation point;
and (5) obtaining the maximum segmentation number of the complete video by the length model obtained in the step (3) and the auxiliary function defined in the step (4).
The method has the advantages that the method is different from a common video behavior identification method for identifying the action category in the video, only the action category in the video is subjected to weak supervision labeling, the action identification method based on weak supervision learning video segmentation is used for segmenting the action in the complete video, and the action in the video can be accurately identified.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a flow chart of a graph convolution neural network-based skeletal data behavior identification method according to an embodiment of the present invention;
FIG. 2 is a video of frames 81, 171, 366, 480 and 703 of a tea brewing action in accordance with an embodiment of the present invention;
fig. 3 is an overall network structure according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described with reference to the drawings are illustrative and are intended to be illustrative of the invention and should not be construed as limiting the invention.
First, a data set used by a behavior recognition method based on weak supervised learning video segmentation is introduced. The Breakfast dataset is a large-scale dataset for motion segmentation, comprising approximately 1712 videos for Breakfast, equivalent to about 360 ten thousand frames of 67 hours of video. Generally containing 10 behavioral actions related to breakfast activities, such as pancakes or omelets, each with detailed annotations like stirring or pouring. There are 48 action classes for this data set, with an average of 6.9 action instances per video. The 48 action classes have finer labels to label the beginning and ending frames of each action in the video in text form, typically for action detection and segmentation. The video length varies from a few seconds to a few minutes, and actions in the video are almost all more densely labeled, with only 7% of the frames belonging to background frames. Fig. 2 is an example of frames in specific actions during a tea brewing process in the data set, with weakly supervised learning only providing class labels and no specific frame delimitation.
Fig. 2 is a flow chart according to one embodiment of the invention.
Setting X as X for video containing T frames1,..,xt,...,xTDividing the whole video into N segments, assigning class labels to each segment, and outputting video segment class labels cN={c1,...ci,...,cNAnd length label of video segment li∈{l1,...,lNIn which c isiIs the category of the ith video, liFor the length of the ith video, i belongs to { 1., N }; will be allocated to frame xtIs defined as cn(t)Where n (T) is the segment number of the tth frame, and T e { 1., T }. Inferring most likely segmentations in videoWhereinCan be calculated by the following formula:
wherein, p (c)i,li| X) represents the probability of the motion category and the motion length of the ith video in video X,represents a category of the predicted ith segment of video,representing the length of the predicted ith video segment, and obtaining the category label of the video segment by formula (1)And length
Sequence of video frames X and class label c thereofNForwarding through a neural network, cNProvided as ground truth class labels, only the length label of each video segment needs to be inferred during the training process. Will be allocated to frame xtClass label of is called cn(t)Labeling motion class and length in video using Viterbi algorithmClass label c written frame by framen(1),...,cn(t),...,cn(T)Wherein c isn(t)∈{c1,...ci,...,cNAnd calculating the cross entropy loss of all video frames:
wherein, p (c)n(t)|xt) Representing video frame xtCorresponding action class cn(t)Probability of (c), -logp (c)n(t)|xt) Representing a frame xtCross entropy loss of (2). The GRU network parameters are updated with a random gradient descent of their gradient Δ L based on the cross entropy loss L of all video frames for updating equation (6).
A buffer is used to store a recently processed sequence of video frames and their inferred frame tags, and K frames of the buffer are sampled and added to the loss function.
Wherein x iskK frame representing a sequence of buffered video frames, K being the total number of buffered video frames, ckRepresenting a frame xkA corresponding category label.
FIG. 1 is an overall system architecture according to the present invention, video x with T frames1,...,xTIs input into the GRU network, the GRU network is connected to the input data sequence for forward propagation and then viterbi decoding, the frame-by-frame class labels generated by the viterbi algorithm being used to calculate the frame-by-frame cross entropy loss.
Decomposing the function according to equation (1):
assuming that the video frames are independent of each other, argmax in equation (4) can be converted to equation (5), as follows:
n (t) is the number of the division frame t, and defines p (x)t|cn(t)) Indicates a category label of cn(t)In case of a video frame xtProbability of p (x)t|cn(t)) As a visual model, p (l)n|cn) Represents the n-th video motion type as cnIn the case of action length lnProbability of p (l)n|cn) As a length model, p (c)n|cn-1) Representing a category of motion c in a videon-1C is the action category of the video segment following the video segmentnProbability of (c), p (c)n|cn-1) Is a context model.
Connecting an incoming sequence of T frame-containing video frames X ═ X in forward propagation using a single layer GRU network with 256 cyclic gate units and softmax outputs1,..,xt,...,xT}。p(c|xt) Is the t-th frame video xtThe softmax score of the GRU network of action category of (1), then the visual model p (x)tC) can be represented by a posterior probability p (c | x)t) Divided by p (c), can be represented as follows:
where p (c) is the prior distribution, p (c) is the normalized frame frequency of the action in the training set, and during the data training process, the number of frames marked by the class label c used by all the video frame sequences is calculated, and then the normalization is performed to obtain the estimated value of p (c). If a certain class label sequence cNIncluding classes that have never been seen, then classes that have not been seen are usedAnd (4) showing. Where # classes represents the total number of categories.
The length model is implemented using class-dependent poisson distribution:
λcthe average length of the action class c is represented,denotes λcTo the power of l, λ is updated for each iterationcL! Is a factorial of l; training sample (X, c)N) Definition of lambda when including classes never seencN/T, where N denotes the number of video segments and T denotes the number of video X frames.
Defining a helper function Q (t, l, c, g), where t denotes the video frame number, l denotes the length of the last segment, c denotes the class label of the last segment, and g denotes the context of the random syntax of the non-terminator, such that the best split point between actions in the video can be found by equation (5), the helper function generating the best probability score for the video segment before t frames that satisfies the following condition, assuming no new segment when l > 1:
Q(t,l,c,g)=Q(t-1,l-1,c,g)·p(xt|c) (8)
assume a new video segment at the tth frame when l is 1:
a context of a random syntax representing possible non-terminators,the possible class labels are represented by a list of possible classes,it is shown that the possible lengths are,indicates the restriction condition is composed ofThe context g of the random syntax of the class label c and the non-terminator can be obtained, and simultaneously the existence g' is satisfied, the possible class can be obtained by gAnd possibly the context of a random syntax of a non-terminator Representing the context of a random syntax in a possible non-terminatorThe category label in the case is c. At all possible lengthsAnd all ofGo on maximize operation, let go through assume class c fromTransition to g.
From equation (8) and equation (9), the maximum possible segmentation N of the complete video for l > 1 and l ═ 1 can be obtained as:
by tracking the maximum parameter of equation (9)Andcan obtain the best class labelAnd lengthThe results on the Breakfast dataset show that the final motion segmentation frame accuracy in the weakly supervised case is 41.5%.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (2)
1. A behavior identification method based on weak supervised learning video segmentation is characterized by comprising the following steps:
step (1), initially dividing the whole video into N sections, and distributing class labels c to each sectionN={c1,...ci,...,cNAnd length label li∈{l1,...,lNIn which c isiIs the category of the ith video, liGenerating a frame label for the length of the ith video by using a Viterbi algorithm for i ∈ { 1., N }, and using a frame-by-frame cross entropy loss L for calculating the cross entropy loss L of each frame according to the video frequency band, and updating GRU network parameters by using the random gradient descent of the gradient delta L of the cross entropy loss L based on all the video frames;
step (2), obtaining video segmentation by Viterbi algorithm in step (1)Find the best action division point, pairDecomposing to obtain a visual model, a length model and a context model, wherein,represents a category of the predicted ith segment of video,indication deviceMeasuring the length of the ith video;
step (3) connecting the input video frame data sequence in forward propagation by using a single-layer GRU network with 256 cyclic gate units and softmax output to obtain a visual model and a length model p (l) relative to the input video frame data sequencen|cn) Wherein c isnIs a class label of the nth video segment, lnIs the length of the nth video, p (l)n|cn) Indicates the action type as cnIn the case of action length lnThe probability of (d);
step (4), defining an auxiliary function, and finding the optimal division point among video actions;
and (5) obtaining the maximum possible segmentation number of the complete video by the length model in the step (3) and the auxiliary function in the step (4).
2. The method of claim 1, wherein the behavior recognition method based on weak supervised learning video segmentation is characterized in that
The step (1) specifically comprises:
setting X as X for video containing T frames1,..,xt,...,xTDividing the whole video into N segments, assigning class labels to each segment, and outputting video segment class labels cN={c1,...ci,...,cNAnd length label of video segment li∈{l1,...,lNIn which c isiIs the category of the ith video, liFor the length of the ith video, i belongs to { 1., N }; will be allocated to frame xtIs defined as cn(t)Where n (T) is the segment number of the tth frame, and T e { 1., T }. Inferring most likely segmentations in videoWhereinCan be calculated by the following formula:
wherein, p (c)i,li| X) represents the probability of the motion category and the motion length of the ith video in video X,represents a category of the predicted ith segment of video,indicating the length of the predicted i-th segment of video.
For video frame sequence X and its class label cNForwarding through a neural network, cNThe label is provided as a ground truth class label, so that only the length label of each video segment needs to be deduced in the training process; will be allocated to frame xtClass label of is called cn(t)Using Viterbi algorithm to classify and length (c) the action in videoi,li) Class label c written frame by framen(1),...,cn(t),...,cn(T)Wherein c isn(t)∈{c1,...cl,...,cNAnd calculating the cross entropy loss of all video frames:
wherein, p (c)n(t)|xt) Representing video frame xtCorresponding action class cn(t)Probability of (c), -logp (c)n(t)|xt) Representing a frame xtCross entropy loss of (d); based on the cross entropy loss L of all video frames, updating GRU network parameters by using the random gradient descent of the gradient Delta L of the video frames for updating the formula (6);
storing a recently processed video frame sequence and an inferred frame label thereof by using a buffer area, sampling K frames of the buffer area, and adding the K frames into a loss function;
wherein x iskK frame representing a sequence of buffered video frames, K being the total number of buffered video frames, ckRepresenting a frame xkA corresponding category label;
the step (2) specifically comprises:
decomposing the function according to equation (1):
assuming that the video frames are independent of each other, argmax in equation (4) can be converted to equation (5), as follows:
n (t) is the number of the division frame t, and defines p (x)t|cn(t)) Indicates a category label of cn(t)In case of a video frame xtProbability of p (x)t|cn(t)) As a visual model, p (l)n|cn) Represents the n-th video motion type as cnIn the case of action length lnProbability of p (l)n|cn) As a length model, p (c)n|cn-1) Representing a category of motion c in a videon-1C is the action category of the video segment following the video segmentnProbability of (c), p (c)n|cn-1) Is a context model.
The step (3) specifically comprises:
connecting an incoming sequence of T frame-containing video frames X ═ X in forward propagation using a single layer GRU network with 256 cyclic gate units and softmax outputs1,..,xt,...,xT}。p(c|xt) Is the t-th frame video xtThe softmax score of the GRU network of action category of (1), then the visual model p (x)tC) can be determined by the posteriorProbability p (c | x)t) Divided by p (c), can be represented as follows:
wherein p (c) is prior distribution, p (c) is normalized frame frequency of action occurrence in training set, in the data training process, the number of frames marked by all video frame sequences using class label c is calculated, and then normalization is carried out to obtain an estimated value of p (c); sequence tagsClasses contained never seen are indicated using 1/# classes.
The length model is implemented using class-dependent poisson distribution:
λcthe average length of the action class c is represented,denotes λcTo the power of l, λ is updated for each iterationcL! Is a factorial of l; training sample (X, c)N) Definition of lambda when including classes never seencN/T, where N denotes the number of video segments and T denotes the number of video X frames;
the step (4) specifically comprises:
defining a helper function Q (t, l, c, g), where t is the video frame number, l represents the length of the last segment, c represents the class label of the last segment, g represents the context of the random syntax of the non-terminator, finding the best split point between the actions in the video by equation (5), the helper function generating the best probability score for the segment before t frame that satisfies the following condition, assuming no new segment when l > 1:
Q(t,l,c,h)=Q(t-1,l-1,c,h)·p(xt|c) (8)
assume a new video segment at frame t when l is 1:
a context of a random syntax representing possible non-terminators,the possible class labels are represented by a list of possible classes,it is shown that the possible lengths are,indicates the restriction condition is composed ofThe context g of the random syntax of the class label c and the non-terminator can be obtained, and simultaneously the existence g' is satisfied, the possible class can be obtained by gAnd possibly the context of a random syntax of a non-terminator Representing the context of a random syntax in a possible non-terminatorThe category label in the case is c. In all possible waysLength of (2)And all ofGo on maximize operation, let go through assume class c fromTransition to g.
From equation (8) and equation (9), the maximum possible segmentation N of the complete video for l > 1 and l ═ 1 can be obtained as:
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110207458.4A CN112861758B (en) | 2021-02-24 | 2021-02-24 | Behavior identification method based on weak supervised learning video segmentation |
NL2029182A NL2029182B1 (en) | 2021-02-24 | 2021-09-14 | Weakly supervised learning based method for recognizing behavior through video segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110207458.4A CN112861758B (en) | 2021-02-24 | 2021-02-24 | Behavior identification method based on weak supervised learning video segmentation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112861758A true CN112861758A (en) | 2021-05-28 |
CN112861758B CN112861758B (en) | 2021-12-31 |
Family
ID=75991121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110207458.4A Active CN112861758B (en) | 2021-02-24 | 2021-02-24 | Behavior identification method based on weak supervised learning video segmentation |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112861758B (en) |
NL (1) | NL2029182B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113813609A (en) * | 2021-06-02 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Game music style classification method and device, readable medium and electronic equipment |
CN114118167A (en) * | 2021-12-04 | 2022-03-01 | 河南大学 | Action sequence segmentation method based on self-supervision less-sample learning and aiming at behavior recognition |
CN114697763A (en) * | 2022-04-07 | 2022-07-01 | 脸萌有限公司 | Video processing method, device, electronic equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543911A (en) * | 2019-08-31 | 2019-12-06 | 华南理工大学 | weak supervision target segmentation method combined with classification task |
CN111079646A (en) * | 2019-12-16 | 2020-04-28 | 中山大学 | Method and system for positioning weak surveillance video time sequence action based on deep learning |
US10824903B2 (en) * | 2016-11-16 | 2020-11-03 | Facebook, Inc. | Deep multi-scale video prediction |
CN111968150A (en) * | 2020-08-19 | 2020-11-20 | 中国科学技术大学 | Weak surveillance video target segmentation method based on full convolution neural network |
-
2021
- 2021-02-24 CN CN202110207458.4A patent/CN112861758B/en active Active
- 2021-09-14 NL NL2029182A patent/NL2029182B1/en active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10824903B2 (en) * | 2016-11-16 | 2020-11-03 | Facebook, Inc. | Deep multi-scale video prediction |
CN110543911A (en) * | 2019-08-31 | 2019-12-06 | 华南理工大学 | weak supervision target segmentation method combined with classification task |
CN111079646A (en) * | 2019-12-16 | 2020-04-28 | 中山大学 | Method and system for positioning weak surveillance video time sequence action based on deep learning |
CN111968150A (en) * | 2020-08-19 | 2020-11-20 | 中国科学技术大学 | Weak surveillance video target segmentation method based on full convolution neural network |
Non-Patent Citations (2)
Title |
---|
MUHAMMAD USMAN RAFIQUE;NATHAN JACOBS: "《Weakly Supervised Building Segmentation from Aerial Images》", 《IGARSS 2019 - 2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM》 * |
陈华锋: "《视频人体行为识别关键技术研究》", 《中国博士学位论文全文数据库 (信息科技辑)》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113813609A (en) * | 2021-06-02 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Game music style classification method and device, readable medium and electronic equipment |
CN113813609B (en) * | 2021-06-02 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Game music style classification method and device, readable medium and electronic equipment |
CN114118167A (en) * | 2021-12-04 | 2022-03-01 | 河南大学 | Action sequence segmentation method based on self-supervision less-sample learning and aiming at behavior recognition |
CN114118167B (en) * | 2021-12-04 | 2024-02-27 | 河南大学 | Action sequence segmentation method aiming at behavior recognition and based on self-supervision less sample learning |
CN114697763A (en) * | 2022-04-07 | 2022-07-01 | 脸萌有限公司 | Video processing method, device, electronic equipment and medium |
US11699463B1 (en) | 2022-04-07 | 2023-07-11 | Lemon Inc. | Video processing method, electronic device, and non-transitory computer-readable storage medium |
CN114697763B (en) * | 2022-04-07 | 2023-11-21 | 脸萌有限公司 | Video processing method, device, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN112861758B (en) | 2021-12-31 |
NL2029182B1 (en) | 2023-02-15 |
NL2029182A (en) | 2022-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111814854B (en) | Target re-identification method without supervision domain adaptation | |
Richard et al. | Temporal action detection using a statistical language model | |
Zhang et al. | Category anchor-guided unsupervised domain adaptation for semantic segmentation | |
CN109299373B (en) | Recommendation system based on graph convolution technology | |
EP3832534B1 (en) | Video action segmentation by mixed temporal domain adaptation | |
CN112861758B (en) | Behavior identification method based on weak supervised learning video segmentation | |
US11640714B2 (en) | Video panoptic segmentation | |
US20220172456A1 (en) | Noise Tolerant Ensemble RCNN for Semi-Supervised Object Detection | |
CN109934261A (en) | A kind of Knowledge driving parameter transformation model and its few sample learning method | |
Chen et al. | Learning linear regression via single-convolutional layer for visual object tracking | |
WO2023109208A1 (en) | Few-shot object detection method and apparatus | |
CN113469186B (en) | Cross-domain migration image segmentation method based on small number of point labels | |
CN116644755B (en) | Multi-task learning-based few-sample named entity recognition method, device and medium | |
CN110458022B (en) | Autonomous learning target detection method based on domain adaptation | |
CN116363374B (en) | Image semantic segmentation network continuous learning method, system, equipment and storage medium | |
CN113591529A (en) | Action segmentation model processing method and device, computer equipment and storage medium | |
CN111753995A (en) | Local interpretable method based on gradient lifting tree | |
CN115292532A (en) | Remote sensing image domain adaptive retrieval method based on pseudo label consistency learning | |
Viet‐Uyen Ha et al. | High variation removal for background subtraction in traffic surveillance systems | |
CN114118207B (en) | Incremental learning image identification method based on network expansion and memory recall mechanism | |
Xiao et al. | Self-explanatory deep salient object detection | |
Fonseca et al. | Model-agnostic approaches to handling noisy labels when training sound event classifiers | |
CN110942463B (en) | Video target segmentation method based on generation countermeasure network | |
Song et al. | Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management | |
CN111723301B (en) | Attention relation identification and labeling method based on hierarchical theme preference semantic matrix |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |