[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117853759A - Multi-target tracking method, system, equipment and storage medium - Google Patents

Multi-target tracking method, system, equipment and storage medium Download PDF

Info

Publication number
CN117853759A
CN117853759A CN202410262998.6A CN202410262998A CN117853759A CN 117853759 A CN117853759 A CN 117853759A CN 202410262998 A CN202410262998 A CN 202410262998A CN 117853759 A CN117853759 A CN 117853759A
Authority
CN
China
Prior art keywords
features
target
frame
image data
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410262998.6A
Other languages
Chinese (zh)
Other versions
CN117853759B (en
Inventor
顾雪平
阚国泽
张洪辉
潘晓东
王一帆
庞梦娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Hairun Shuju Technology Co ltd
Original Assignee
Shandong Hairun Shuju Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Hairun Shuju Technology Co ltd filed Critical Shandong Hairun Shuju Technology Co ltd
Priority to CN202410262998.6A priority Critical patent/CN117853759B/en
Publication of CN117853759A publication Critical patent/CN117853759A/en
Application granted granted Critical
Publication of CN117853759B publication Critical patent/CN117853759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image processing, and particularly relates to a multi-target tracking method, a system, equipment and a storage medium, which are used for carrying out bounding box processing on image data in a video, further respectively detecting and correlating extracted appearance characteristics and bounding box characteristics, obtaining a preliminary tracking track through Top-K score screening, carrying out track updating by combining a constructed graph to obtain a multi-target tracking track, and improving the accuracy of multi-target tracking; and by sensing the characteristics of the image data of the adjacent past frames and the adjacent future frames of the current frame and respectively aggregating, more context information can be obtained, and the capturing capability of the continuity of the target track is improved.

Description

Multi-target tracking method, system, equipment and storage medium
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a multi-target tracking method, a system, equipment and a storage medium.
Background
In recent years, image feature analysis is performed on a captured image to realize target tracking, and the method is widely used in a multi-target tracking technique. Detection-based tracking has become a major paradigm in multi-target tracking (MOT) tasks, i.e., tracking is considered an association problem given the detection results. This tracking-detection framework allows for the incorporation of a variety of target cues into the tracking scheme. These cues include the following: first, the smoothness of the target trajectory in the time domain is exploited, which is determined by the high frame rate of the camera and the slow movement of the target. Second, the appearance characteristics of each detected object are considered, as the appearance characteristics from the same object should be similar, while the characteristics from different objects are typically different. Finally, consider interactive cues between different targets, including relationships between adjacent targets.
Currently, the studies of graph-based multi-objective tracking can be broadly divided into two aspects: on the one hand, the improvement of the cost is focused. This type of approach places an emphasis on improving edge costs using deep learning techniques. By using a twin Convolutional Neural Network (CNN) to encode reliable pairwise interactions between targets, but this approach does not take into account the critical features of object movement in the actual scene, creating a problem of correlation errors. On the other hand, emphasis is placed on the construction of the graph. Much research has been devoted to building a complex graph optimization framework that encodes the detection of higher-order dependencies between each other by combining multiple information sources. However, these methods cannot improve the target shielding and crowding conditions existing in the complex real scene, which results in the loss of the target track and affects the accuracy of multi-target tracking.
Disclosure of Invention
The invention provides a multi-target tracking method, a multi-target tracking system, multi-target tracking equipment and a storage medium.
The technical scheme of the invention is as follows:
the invention provides a multi-target tracking method, which comprises the following steps:
s1: acquiring image data of a plurality of adjacent frames in a video, and performing boundary frame processing on targets in the image data of the plurality of adjacent frames;
s2: the method comprises the steps of extracting appearance features and boundary frame features from image data under a plurality of adjacent frames after processing a boundary frame through convolution, respectively detecting and correlating the appearance features and the boundary frame features extracted from the plurality of adjacent frames to obtain a plurality of tracks, and obtaining a preliminary tracking track after the plurality of tracks are screened by Top-K scores based on the extracted appearance features and the boundary frame features;
s3: based on the preliminary tracking track, constructing a graph by taking boundary frame features as motion features and appearance features as visual features, wherein the motion features are taken as features of edges of the constructed graph, the visual features are taken as features of nodes of the constructed graph, and if the two nodes meet all the following conditions:
(1) The distance between the center coordinates of the two nodes is smaller than a preset distance;
(2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold;
(3) The cross-over ratio of the two nodes is greater than a cross-over ratio threshold;
connecting the two nodes through edges to obtain an updated track;
s4: and updating the graph based on the updated track, respectively aggregating the characteristics of the nodes and the characteristics of the edges which are connected in the adjacent past frames and the adjacent future frames of the current frame, and then embedding the characteristics of the nodes and the characteristics of the edges of the current frame as the characteristics of the nodes and the characteristics of the edges of the updated current frame, and after adding one to the current frame, executing S1 until all frames in the video are processed, so as to obtain the multi-target tracking track.
Before the detection association in S2 of the present invention, the method further includes performing an optimization process on the boundary frame feature in the current frame image data, specifically,
four vertexes of the boundary frame characteristics in the adjacent future frame image data of the current frame are obtained respectively, four polar lines are led through the corresponding four vertexes respectively, four vertexes of the boundary frame characteristics in the adjacent future frame image data of the current frame are obtained according to a cost function, the four vertexes of the boundary frame characteristics in the adjacent future frame image data are intersected with the four polar lines respectively to obtain the prediction boundary frame characteristics in the adjacent future frame image data, and if the intersection ratio of the prediction boundary frame characteristics in the adjacent future frame image data and the boundary frame characteristics extracted from the adjacent future frame image data is larger than an intersection ratio threshold value, the boundary frame characteristics in the current frame image data and the adjacent future frame image data are optimized according to the prediction boundary frame characteristics in the adjacent future frame image data, so that the boundary frame characteristics in the optimized current frame image data are obtained for detection association.
The detection association in S2 of the present invention, specifically,
based on the extracted appearance features and the boundary frame features, if the similarity of the extracted appearance features in the adjacent frames is greater than an appearance feature similarity threshold and the intersection ratio of the extracted boundary frame features in the adjacent frames is greater than an intersection ratio threshold, respectively associating the extracted appearance features in the adjacent frames with the boundary frame features.
The step S4 of the invention is to obtain the multi-target tracking track, and further comprises the steps of carrying out edge classification on the multi-target tracking track, predicting edge scores, specifically,
based on the characteristics of edges in the multi-target tracking track, calculating the probability that targets in image data of adjacent past frames and adjacent future frames are the same target by utilizing a Hungary algorithm of an edge scoring matrix, and if the probability is larger than a preset probability threshold, reserving the multi-target tracking track in the current frame.
Before the multi-target tracking track is obtained in the step S4, the method also comprises the step of detecting the missed target in the current frame image data by adopting a single-target tracking method.
Before the multi-target tracking track is obtained in S4 of the present invention, the method further includes processing the missed targets in the continuous frame image data, specifically,
based on the extracted appearance features and the boundary box features, calculating the cost of each target in the missed target and the multi-target tracking tracks, if the cost of the missed target and a certain target in the multi-target tracking tracks is smaller than a preset cost threshold, matching the missed target with a certain target in the multi-target tracking tracks, and constraining one multi-target tracking track to be associated with one missed target at most and one missed target to be associated with one multi-target tracking track at most.
The processing of the boundary frame in the S1 comprises the steps of determining the height of the boundary frame, the width of the boundary frame, the center of the boundary frame and the frame index.
The present invention also provides a multi-target tracking system comprising:
an image preprocessing module: the method comprises the steps of acquiring image data of a plurality of adjacent frames in a video, and carrying out boundary frame processing on targets in the image data of the plurality of adjacent frames;
the preliminary tracking track generation module: the method comprises the steps of convolving image data under a plurality of adjacent frames processed by a boundary frame to extract appearance characteristics and boundary frame characteristics, respectively detecting and correlating the appearance characteristics and the boundary frame characteristics extracted from the plurality of adjacent frames to obtain a plurality of tracks, and screening the plurality of tracks by Top-K scores based on the extracted appearance characteristics and the boundary frame characteristics to obtain a preliminary tracking track;
the construction module of the graph: based on the preliminary tracking track, constructing a graph by taking boundary frame features as motion features and appearance features as visual features, wherein the motion features are taken as features of edges of the constructed graph, the visual features are taken as features of nodes of the constructed graph, and if the two nodes meet all the following conditions:
(1) The distance between the center coordinates of the two nodes is smaller than a preset distance;
(2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold;
(3) The cross-over ratio of the two nodes is greater than a cross-over ratio threshold;
connecting the two nodes through edges to obtain an updated track;
a multi-target tracking track generation module: and updating the graph based on the updated track, respectively aggregating the characteristics of the nodes and the characteristics of the edges which are connected in the adjacent past frames and the adjacent future frames of the current frame, and then embedding the characteristics of the nodes and the characteristics of the edges of the current frame as the characteristics of the nodes and the characteristics of the edges of the updated current frame, adding one to the current frame, and entering an image preprocessing module until all frames in the video are processed, so as to obtain the multi-target tracking track.
The invention also provides multi-target tracking equipment, which comprises a processor and a memory, wherein the multi-target tracking method is realized when the processor executes the computer program stored in the memory.
The invention also provides a multi-target tracking storage medium for storing a computer program, wherein the computer program realizes the multi-target tracking method when being executed by a processor.
The beneficial effects are that: according to the method, the boundary box processing is carried out on the image data in the video, so that the extracted appearance characteristics and the boundary box characteristics are respectively detected and associated, the preliminary tracking track is obtained through Top-K score screening, the track is updated by combining with the constructed graph, the multi-target tracking track is obtained, and the accuracy of multi-target tracking is improved;
according to the invention, the polar lines are introduced to obtain the optimized position of the target boundary frame of the current frame, and the appearance characteristics and the boundary frame characteristics of the target can be considered in track generation, so that the accuracy of target association in a complex scene is improved, and the influence of camera motion is reduced through association of adjacent frames, and the overall association accuracy is improved;
according to the invention, through sensing and respectively aggregating the embedded information of the characteristics of the nodes and the edges of the adjacent past frames and the adjacent future frames of the current frame, more context information can be obtained, so that the continuity of the target track is maintained, especially in complex scenes such as occlusion, camera movement and the like, the problems of target track loss and the like caused by occlusion or camera movement in an actual scene are solved, and the capturing capability of the continuity of the target track is improved.
Drawings
Figure 1 is a flow chart of the multi-target tracking method of the present application,
fig. 2 is a schematic diagram of detection of prediction bounding box features in image data of adjacent future frames based on epipolar lines in the present application, where (a) is a first target detected by the t-th frame, (b) is a target detected by the t+1th frame, (c) is a result of intersection of the target prediction bounding box of the t+1th frame and four epipolar lines based on epipolar lines, (d) is a prediction bounding box for obtaining a target optimum based on epipolar lines,
fig. 3 is a schematic diagram of node update during message passing in a constructed diagram in different manners, where (a) is an initial setting of node update during message passing, (b) is a prior art node update, and (c) is a node update of the present application.
Detailed Description
The following examples are intended to illustrate the invention, but not to limit it further.
The invention provides a multi-target tracking method, as shown in figure 1, comprising the following steps:
s1: acquiring image data of a plurality of adjacent frames in a video, and performing boundary frame processing on targets in the image data of the plurality of adjacent frames;
s2: the method comprises the steps of extracting appearance features and boundary frame features from image data under a plurality of adjacent frames after processing a boundary frame through convolution, respectively detecting and correlating the appearance features and the boundary frame features extracted from the plurality of adjacent frames to obtain a plurality of tracks, and obtaining a preliminary tracking track after the plurality of tracks are screened by Top-K scores based on the extracted appearance features and the boundary frame features;
s3: based on the preliminary tracking track, constructing a graph by taking boundary frame features as motion features and appearance features as visual features, wherein the motion features are taken as features of edges of the constructed graph, the visual features are taken as features of nodes of the constructed graph, and if the two nodes meet all the following conditions:
(1) The distance between the center coordinates of the two nodes is smaller than a preset distance;
(2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold;
(3) The cross-over ratio of the two nodes is greater than a cross-over ratio threshold;
connecting the two nodes through edges to obtain an updated track;
s4: and updating the graph based on the updated track, respectively aggregating the characteristics of the nodes and the characteristics of the edges which are connected in the adjacent past frames and the adjacent future frames of the current frame, and then embedding the characteristics of the nodes and the characteristics of the edges of the current frame as the characteristics of the nodes and the characteristics of the edges of the updated current frame, and after adding one to the current frame, executing S1 until all frames in the video are processed, so as to obtain the multi-target tracking track.
According to the method, the boundary box processing is carried out on the image data in the video, the extracted appearance characteristics and the boundary box characteristics are detected and associated respectively, the preliminary tracking track is obtained through Top-K score screening, the constructed graph is combined, track updating is carried out, the multi-target tracking track is obtained, and the accuracy of multi-target tracking is improved.
S1: and acquiring image data of a plurality of adjacent frames in the video, and carrying out boundary frame processing on targets in the image data of the plurality of adjacent frames.
To further achieve accurate tracking of the target position, the bounding box processing in S1 includes determining a height h of the bounding box t Width w of bounding box t Center of bounding box (x t ,y t ) Frame index t.
The method adopts a boundary frame to realize the preliminary positioning of the target position in the image data, wherein the track is formed by connecting a series of boundary frames of different frames, a boundary frame on the track is generated by using the t-th frame to represent a target, and the target is represented by usingRepresenting a set of objects that occurred prior to the t-th frame of image data. Each element W e W in the set W is then taken to represent the track of the target in different frames, i.e. the same targetA set of consecutive detections at different frames over a period of time. In addition, adopt D t Representing a set of objects to be detected.
S2: and (3) extracting appearance characteristics and boundary frame characteristics from the image data of a plurality of adjacent frames processed by the boundary frame after convolution, respectively detecting and correlating the appearance characteristics and the boundary frame characteristics extracted from the plurality of adjacent frames to obtain a plurality of tracks, and screening the plurality of tracks through Top-K scores based on the extracted appearance characteristics and the boundary frame characteristics to obtain a preliminary tracking track.
The method extracts the appearance characteristic and the boundary frame characteristic of each target in the t-th frame image data at the same time, and defines the detection score D based on the appearance characteristic score . That is, in the complete track of each object, a track having a dimension d is included ob Is described, and the objects of the appearance features and bounding box features.
The appearance features refer to features for describing the appearance of the target, and generally include information of color, texture, shape and the like of the target.
The boundary box characteristics refer to boundary box parameters of each detection result, wherein the boundary box parameters comprise height and width.
The detection score, which may be considered a measure of the appearance characteristics of the object, represents the salience or confidence of the object in the image.
In addition, it should be noted that, due to unreliable detection results, the complete track of an object may be divided into track segments.
In addition, before the detection association, the method further comprises the step of optimizing the boundary box characteristics in the current frame image data. Considering that rapid camera movement may affect the accuracy of target tracking, the present application assumes that the target is moving slowly or stationary, and predicts the predicted bounding box features in the image data of adjacent future frames by first introducing epipolar lines.
Specifically, four vertexes of boundary frame characteristics in adjacent past frames of the current frame and boundary frame characteristics in current frame image data are respectively obtained, four polar lines are respectively led through the corresponding four vertexes, four vertexes of boundary frame characteristics in adjacent future frame image data of the current frame are obtained according to a cost function, the four vertexes of boundary frame characteristics in the adjacent future frame image data are respectively intersected with the four polar lines to obtain prediction boundary frame characteristics in the adjacent future frame image data, if the intersection ratio of the prediction boundary frame characteristics in the adjacent future frame image data and the boundary frame characteristics extracted in the adjacent future frame image data is larger than an intersection ratio threshold value, the boundary frame characteristics in the current frame image data and the boundary frame characteristics in the adjacent future frame image data are optimized according to the prediction boundary frame characteristics in the adjacent future frame image data, and the boundary frame characteristics in the current frame image data are obtained for detection association.
For example, four vertices of the target bounding box in the t-th frame are defined as J i,t Where i ε {1,2,3,4}. Similarly, will J i,t+1 I e {1,2,3,4} is defined as the bounding box in the t+1st frame. Then define a cost function
Wherein,ensuring that the predicted t+1st frame target bounding box intersects as much as possible with the four corresponding epipolar lines,/->Target size constraints are used to ensure that the predicted t+1st frame target bounding box aligns with the true position of the t+1st frame target as much as possible. The accuracy of the predicted t+1st frame target bounding box position can be ensured using a cost function.
Furthermore, the boundary frame feature in the current frame image data is optimized by matching feature points (SURF points) in the t-th frame image data and the t+1-th frame image data by using the RANSAC algorithm through the base matrix η.
Among them, SURF (Speeded Up Robust Features) is an algorithm for feature points in computer vision. When the RANSAC algorithm is used to match SURF points between two consecutive frames to estimate the basis matrix, this means that the feature points extracted by the SURF algorithm are used for image-to-image matching. This matching can be used to calculate a basis matrix between the two images, thereby enabling relative positioning or motion estimation between the images. The RANSAC algorithm can help to exclude some false matches and improve the accuracy of the matches.
As shown in FIG. 2, where (a) is the first object detected for the t-th frame, X 1,t Representing the position of the upper left corner of the target bounding box detected in the t-th frame, X 2,t Representing the position of the upper right corner of the target bounding box detected in the t-th frame, X 3,t Representing the position of the lower right corner of the target bounding box detected in the t-th frame, X 4,t Representing the position at which the lower left corner of the target bounding box was detected in the t-th frame. (b) For the target detected by the t+1st frame, the dashed box represents (a) the predicted position of the first target detected by the t frame in the t+1st frame, and the two solid border boxes represent the actual positions of the two targets detected by the t+1st frame.
As can be seen from (a) and (b), in the t+1st frame, the first target detected in the t frame detects that the predicted position (dashed box) of the target has a larger intersection ratio (IoU) with the actual position (right solid border box) of the other target, which indicates that the two different targets have higher similarity or overlap, while the predicted position (dashed box) of the target does not have good overlap with the actual position (left solid border box) of the target, which indicates that the tracking method is not accurate when no polar line is introduced, and is easy to be associated with errors.
IoU (cross-over ratio) is an indicator of how well two bounding boxes overlap. If IoU is larger, the larger the overlapping portion of the two bounding boxes is illustrated, i.e., the higher the similarity between the objects.
The tracking method of the lead-in line is that firstly, if the target is assumed to be stationary or slow to move, the boundary box of the target is at the four vertexes X of the t-th frame i,t Should be located on the corresponding pole line of the t+1st frame, i.e. at the tthe target prediction bounding box of the t+1 frame should intersect as much as possible with the four epipolar lines as shown in (c) of fig. 2. After epipolar introduction, i.e., in frame t+1, the target prediction bounding box position (white bounding box in fig. 2 (d)) overlaps as much as possible with the actual bounding box position of the target.
Second, assuming that the size of the bounding box does not vary much between adjacent frames, a target-optimal prediction bounding box can be obtained, as shown by the dark bounding box in fig. 2 (d). Wherein X in FIG. 2 (d) 1,t+1 X represents the position of the upper left corner of the target-optimal prediction bounding box in the t+1st frame 2,t+1 X represents the position of the upper right corner of the target-optimal prediction bounding box in the t+1st frame 3,t+1 X represents the position of the bottom right corner of the target-optimal prediction bounding box in the t+1st frame 4,t+1 Representing the position of the bottom left corner of the target-optimal prediction bounding box in the t+1 frame.
According to the method and the device, the polar lines are introduced to obtain the optimized position of the target boundary frame of the current frame, and the appearance characteristics and the boundary frame characteristics of the target can be considered in track generation, so that the accuracy of target association in a complex scene is improved, the influence of camera motion is reduced through association of adjacent frames, and the overall association accuracy is improved.
In addition, in order to make track generation simpler and more convenient, based on the extracted appearance features and the boundary frame features, if the similarity of the extracted appearance features in the adjacent frames is greater than an appearance feature similarity threshold value and the intersection ratio of the extracted boundary frame features in the adjacent frames is greater than an intersection ratio threshold value, the extracted appearance features in the adjacent frames and the boundary frame features are respectively associated, so that the association error is ensured to be as small as possible.
Further, the threshold settings are sensitive to the distribution of detection scores, resulting in the need for calibration from different data sets and detectors, considering the use of the threshold to screen the trajectory.
Top-K score detection can compensate for these missed detections by selecting detection results with high scores when certain targets are not detected correctly. Thus, even if the detector does not completely cover all targets, the multi-target tracking method provided by the present application still has an opportunity to capture missed targets.
According to the method, the Top-K score screening is carried out on a plurality of tracks, and the detection result with high score is selected, so that the condition of missing detection of target detection is compensated.
S3: based on the preliminary tracking track, constructing a graph by taking boundary frame features as motion features and appearance features as visual features, wherein the motion features are taken as features of edges of the constructed graph, the visual features are taken as features of nodes of the constructed graph, and if the two nodes meet all the following conditions:
(1) The distance between the center coordinates of the two nodes is smaller than a preset distance;
(2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold;
(3) The cross-over ratio of the two nodes is greater than a cross-over ratio threshold;
the two nodes are connected by an edge to obtain an updated track.
For the track generation process, a graph model is defined, video data is converted into a graph, the track of each target is regarded as a node, and edges are generated by association of two nodes.
Specifically, the drawings are defined asAnd the motion features and the visual features are respectively taken as feature sets of the edge (E) and the node (V), and edge embedding and node embedding are respectively generated for each edge and each node. Thus for each nodeAll have a node embedded value +.>For each edge->All have an edge embedded value +.>
Setting the trajectories of different targets as different nodes (e.g. N oi And N oj ) And only when N oi And N oj And the connection is performed when the corresponding condition is satisfied. In particular the number of the elements,and N oj The connection of (2) needs to satisfy 3 conditions: (1) The distance between the center coordinates of the two nodes is smaller than a preset distance; (2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold; (3) the cross-over ratio of the two nodes is greater than a cross-over ratio threshold. For each of the above conditions, a given N is selected oj Number of (5) to be in charge of->And are connected without repeated connections. Since the connection between nodes is bi-directional, N oi And N oj The features are updated.
In addition, tracking is not possible in a short period of time because some tracks may become invisible because the target is completely occluded. These temporarily lost tracks are stored inIs then added to N in the graph oi Is a kind of medium. In this process, the storage time is greater than +.>To prevent false positive conditions. Said->Maximum time indicating that the track is considered invisible, +.>Representing the shortest time that the track is considered invisible.
Then, the present application also introduces a binary variable for each edge in the graph. In the classical minimum component flow formula, the label of an edge connecting nodes that meet the following conditions simultaneously is defined as 1, provided that: (i) Satisfying three conditions (1) (2) (3) that the two nodes are connected; (ii) continuous in time within the track. The labels on all the remaining sides are defined as 0.
Specifically, the trajectory w i Equivalent set of edgesCorresponding to the paths in the constructed graph arranged in sequence. Defining labels of edges based on the result, i.e. +.>Defining a binary variable +.>
When (when)When in use, the side is->Is considered active. />For this, it is assumed that the trajectories in W are disjoint nodes, i.e. one node cannot belong to more than one trajectory. Therefore, y must satisfy a set of linear constraints, i.e. +.>The method comprises the following steps:
the inequality above shows that each node is connected to at most one node in the graph by an active edge and at most one node in the future trajectory graph, thereby completing the construction of the graph and obtaining an updated trajectory.
S4: and updating the graph based on the updated track, respectively aggregating the characteristics of the nodes connected in the adjacent past frames and the adjacent future frames of the current frame and the characteristics of the edges, embedding the characteristics of the nodes and the characteristics of the edges of the current frame as the characteristics of the nodes updated by the current frame and the characteristics of the edges, and executing S1 after adding one to the current frame until all frames in the video are processed, so as to obtain the multi-target tracking track.
The present application is based on a Message Passing Network (MPN) to achieve the overall viewAnd propagates and updates information contained in the characteristics of the edges and the characteristics of the nodes. The propagation process is divided into embedded updates of nodes and embedded updates of edges, called messaging steps. Wherein each messaging step is further divided into two update procedures: one is an update from edge to nodeThe other is update from node to edge +.>. The above updates are all performed sequentially, with the number of iterations S being fixed.
It is considered that in the actual updating process, after performing S iterations, each node contains information of all other nodes with a distance S in the graph. In the course of node and edge updates, it is also allowed to compare each node with its neighbors and aggregate information from all neighbors in order to update its embedded information to obtain more context information.
However, taking into account the linear constraints above, the constraints determine that each node in the graph can be connected to at most one node in the graph and another node in the future trajectory graph. Thus, aggregating embedded information for all neighboring nodes at once may make it difficult for updated node characteristics to capture whether these constraints are violated.
Thus, the present application breaks the aggregate into two parts to create a time-aware update rule: one is a future node and the other is a past node.
Specifically, use ofAnd->To represent nodes +.>Neighbor nodes in the t-1 st frame and the t+1 th frame. On the basis of this, two different perceptual functions are defined, namely +.>And->A perceptual function for the t+1st frame and a perceptual function for the t-1 st frame, respectively. In the iteration of the message passing step for s times, every node is +.>First calculate all its neighbors +.>Edge-to-node embedding of the t-1 st and t+1 th frames as follows:
wherein,for neighbor +.>Edge-to-node embedding->For initial embedded value, ++>Feature embedding for node of the s-1 th iteration,/->The feature embedding of the edge for the s-1 iteration ensures that the original features are not forgotten during the message passing process.
Then, the characteristics of the nodes and the characteristics of the edges of the t-1 frame and the t+1 frame are respectively aggregated and embedded into the preliminary tracking track, wherein an aggregation formula is as follows:
wherein,an aggregate embedded value for the t+1st frame, ">For the aggregate embedded value of the t-1 frame,for neighbor +.>Is embedded into the node.
Finally, updating the preliminary tracking track by combining a formula:
wherein,feature embedding for node of the s-th iteration, < >>As a learnable function。
By sensing the embedded information of the characteristics of the nodes and the edges of the adjacent past frames and the adjacent future frames of the current frame and respectively aggregating, more context information can be obtained, the continuity of the target track can be maintained, especially in complex scenes such as occlusion, camera movement and the like, the problems of target track loss and the like caused by occlusion or camera movement in an actual scene are solved, and the capturing capacity of the continuity of the target track is improved.
Fig. 3 is a schematic diagram of node updates during message delivery in a different manner in a constructed diagram. The arrow direction indicates the time direction, and the time is divided into t-1 frame, t frame, t+1 frame, and hasAnd->. Numerals 1-5 represent different nodes under different frames, wherein numeral 3 is a node of the t frame, numerals 1 and 2 are different neighbor nodes of the node of the t frame at the t-1 frame, and numerals 4 and 5 are different neighbor nodes of the node of the t frame at the t+1 frame. In addition, pentagonal boxes represent embedded information of neighbor nodes. The center plus circle pattern represents an aggregation of the different embedded information. Diamond represents a multi-layer sensor.
Fig. 3 (a) is an initial setting of node update in the message passing process, which means that only the embedded information of the neighbor node of the t-th frame is considered. (b) Updating for the prior art node represents the one-time aggregation of embedded information of all neighboring nodes. (c) For the node update of the present application, the embeddings representing the past and future frames are aggregated separately, then concatenated and input into the multi-layer perceptron to obtain a new node embedment.
In order to achieve multi-target tracking, the step S4 further includes processing the missed target in the image data of the t frame and the missed target in the image data of the continuous frame before obtaining the multi-target tracking track aiming at the target miss phenomenon.
Wherein, for the current frame imageThe missing targets in the image data are detected by adopting a single target tracking method to recover the missing targets in the image data of the t frame and the missing targets are combined with the target with high detection score D score The bounding boxes restored by the single target tracking strategy.
While processing missed objects in successive frames of image data, a detection recovery strategy is proposed that utilizes a linear motion model to recover those missed objects. In particular to a special-shaped ceramic tile,
based on the extracted appearance features and the boundary box features, calculating the cost of each target in the missed target and the multi-target tracking tracks, if the cost of the missed target and a certain target in the multi-target tracking tracks is smaller than a preset cost threshold, matching the missed target with a certain target in the multi-target tracking tracks, and constraining one multi-target tracking track to be associated with one missed target at most and one missed target to be associated with one multi-target tracking track at most.
Suppose that if a target appears in the t-1 frame, the target is a normal target. Otherwise, the target is a missing target. The method willThe ith object in (a) is denoted as o i D is to t The j-th detection in (a) is denoted as d j 。/>Representing a set of objects occurring before the t-th frame of image data, D t Representing a set of objects to be detected. d, d j And o i The allocation status between them is denoted as a i,j Wherein->Representing object o i And detect d j Associated, but->The opposite is indicated. Distribution and integration use->Representation, where |D t I indicates the number of targets to be detected, < +.>Representing the original number of targets. The optimal allocation set may be represented as follows:
wherein,representing the optimal allocation set, +.>For the time minimum considered when storing the trajectory, to prevent false positive situations, σ represents a hyper-parameter, ++>And->Respectively represent the object o i And detecting d j Is characterized by the appearance of the (c) in terms of,representing object o i And detecting d j Cost of the two, the cost is the target o i And detecting d j Matching costs between the targets o i And detect d j Similarity or degree of matching between the two.
In addition, in the target detection recovery process, one detection can be associated with one target at most, and one target can be associated with one detection at most. The specific constraint formula is as follows:
meanwhile, according to the constraint formula, the following is allowed to exist:and->I.e. to detect the absence of a target of the current frame independent of the detected target.
In order to further accurately obtain the tracked target track, edge classification processing is performed on the multi-target tracking track. The step S4 of obtaining the multi-target tracking track further comprises performing edge classification on the multi-target tracking track, predicting an edge score, specifically,
based on the characteristics of edges in the multi-target tracking track, calculating the probability that targets in image data of adjacent past frames and adjacent future frames are the same target by utilizing a Hungary algorithm of an edge scoring matrix, and if the probability is larger than a preset probability threshold, reserving the multi-target tracking track in the current frame.
Due to the nodeAt N oj A number of nodes are connected, and the best matching is performed by using the hungarian algorithm based on the edge scoring matrix. Thus (S)>There is only one best matching edge score.
The edge score is used to evaluate the probability of whether the connected tracks belong to the same object within the time span. Such a prediction of scores facilitates trajectory matching in the graph.
The present application also provides a multi-target tracking system comprising:
an image preprocessing module: the method comprises the steps of acquiring image data of a plurality of adjacent frames in a video, and carrying out boundary frame processing on targets in the image data of the plurality of adjacent frames;
the preliminary tracking track generation module: the method comprises the steps of convolving image data under a plurality of adjacent frames processed by a boundary frame to extract appearance characteristics and boundary frame characteristics, respectively detecting and correlating the appearance characteristics and the boundary frame characteristics extracted from the plurality of adjacent frames to obtain a plurality of tracks, and screening the plurality of tracks by Top-K scores based on the extracted appearance characteristics and the boundary frame characteristics to obtain a preliminary tracking track;
the construction module of the graph: based on the preliminary tracking track, constructing a graph by taking boundary frame features as motion features and appearance features as visual features, wherein the motion features are taken as features of edges of the constructed graph, the visual features are taken as features of nodes of the constructed graph, and if the two nodes meet all the following conditions:
(1) The distance between the center coordinates of the two nodes is smaller than a preset distance;
(2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold;
(3) The cross-over ratio of the two nodes is greater than a cross-over ratio threshold;
connecting the two nodes through edges to obtain an updated track;
a multi-target tracking track generation module: and updating the graph based on the updated track, respectively aggregating the characteristics of the nodes and the characteristics of the edges which are connected in the adjacent past frames and the adjacent future frames of the current frame, and then embedding the characteristics of the nodes and the characteristics of the edges of the current frame as the characteristics of the nodes and the characteristics of the edges of the updated current frame, adding one to the current frame, and entering an image preprocessing module until all frames in the video are processed, so as to obtain the multi-target tracking track.
The application also provides multi-target tracking equipment, which comprises a processor and a memory, wherein the multi-target tracking method is realized when the processor executes the computer program stored in the memory.
The present application also provides a multi-target tracking storage medium for storing a computer program, wherein the computer program implements the multi-target tracking method when executed by a processor.

Claims (10)

1. A multi-target tracking method, comprising the steps of:
s1: acquiring image data of a plurality of adjacent frames in a video, and performing boundary frame processing on targets in the image data of the plurality of adjacent frames;
s2: the method comprises the steps of extracting appearance features and boundary frame features from image data under a plurality of adjacent frames after processing a boundary frame through convolution, respectively detecting and correlating the appearance features and the boundary frame features extracted from the plurality of adjacent frames to obtain a plurality of tracks, and obtaining a preliminary tracking track after the plurality of tracks are screened by Top-K scores based on the extracted appearance features and the boundary frame features;
s3: based on the preliminary tracking track, constructing a graph by taking boundary frame features as motion features and appearance features as visual features, wherein the motion features are taken as features of edges of the constructed graph, the visual features are taken as features of nodes of the constructed graph, and if the two nodes meet all the following conditions:
(1) The distance between the center coordinates of the two nodes is smaller than a preset distance;
(2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold;
(3) The cross-over ratio of the two nodes is greater than a cross-over ratio threshold;
connecting the two nodes through edges to obtain an updated track;
s4: and updating the graph based on the updated track, respectively aggregating the characteristics of the nodes and the characteristics of the edges which are connected in the adjacent past frames and the adjacent future frames of the current frame, and then embedding the characteristics of the nodes and the characteristics of the edges of the current frame as the characteristics of the nodes and the characteristics of the edges of the updated current frame, and after adding one to the current frame, executing S1 until all frames in the video are processed, so as to obtain the multi-target tracking track.
2. The multi-object tracking method according to claim 1, wherein the step of performing the detection association in S2 further comprises performing an optimization process on the bounding box features in the current frame image data, specifically,
four vertexes of the boundary frame characteristics in the adjacent future frame image data of the current frame are obtained respectively, four polar lines are led through the corresponding four vertexes respectively, four vertexes of the boundary frame characteristics in the adjacent future frame image data of the current frame are obtained according to a cost function, the four vertexes of the boundary frame characteristics in the adjacent future frame image data are intersected with the four polar lines respectively to obtain the prediction boundary frame characteristics in the adjacent future frame image data, and if the intersection ratio of the prediction boundary frame characteristics in the adjacent future frame image data and the boundary frame characteristics extracted from the adjacent future frame image data is larger than an intersection ratio threshold value, the boundary frame characteristics in the current frame image data and the adjacent future frame image data are optimized according to the prediction boundary frame characteristics in the adjacent future frame image data, so that the boundary frame characteristics in the optimized current frame image data are obtained for detection association.
3. The multi-target tracking method according to claim 1, wherein the detection correlation in S2 is, in particular,
based on the extracted appearance features and the boundary frame features, if the similarity of the extracted appearance features in the adjacent frames is greater than an appearance feature similarity threshold and the intersection ratio of the extracted boundary frame features in the adjacent frames is greater than an intersection ratio threshold, respectively associating the extracted appearance features in the adjacent frames with the boundary frame features.
4. The multi-target tracking method according to claim 1, wherein the obtaining the multi-target tracking trajectory in S4 further comprises edge classification of the multi-target tracking trajectory, prediction of an edge score, in particular,
based on the characteristics of edges in the multi-target tracking track, calculating the probability that targets in image data of adjacent past frames and adjacent future frames are the same target by utilizing a Hungary algorithm of an edge scoring matrix, and if the probability is larger than a preset probability threshold, reserving the multi-target tracking track in the current frame.
5. The multi-target tracking method according to claim 1, wherein the step S4 further comprises detecting the missed target in the current frame image data by a single target tracking method before the multi-target tracking track is obtained.
6. The multi-target tracking method according to claim 1, wherein the step of S4 further comprises processing the missed target in the continuous frame image data, in particular,
based on the extracted appearance features and the boundary box features, calculating the cost of each target in the missed target and the multi-target tracking tracks, if the cost of the missed target and a certain target in the multi-target tracking tracks is smaller than a preset cost threshold, matching the missed target with a certain target in the multi-target tracking tracks, and constraining one multi-target tracking track to be associated with one missed target at most and one missed target to be associated with one multi-target tracking track at most.
7. The multi-object tracking method of claim 1 wherein the bounding box processing in S1 includes determining a height of the bounding box, a width of the bounding box, a center of the bounding box, a frame index.
8. A multi-target tracking system, comprising:
an image preprocessing module: the method comprises the steps of acquiring image data of a plurality of adjacent frames in a video, and carrying out boundary frame processing on targets in the image data of the plurality of adjacent frames;
the preliminary tracking track generation module: the method comprises the steps of convolving image data under a plurality of adjacent frames processed by a boundary frame to extract appearance characteristics and boundary frame characteristics, respectively detecting and correlating the appearance characteristics and the boundary frame characteristics extracted from the plurality of adjacent frames to obtain a plurality of tracks, and screening the plurality of tracks by Top-K scores based on the extracted appearance characteristics and the boundary frame characteristics to obtain a preliminary tracking track;
the construction module of the graph: based on the preliminary tracking track, constructing a graph by taking boundary frame features as motion features and appearance features as visual features, wherein the motion features are taken as features of edges of the constructed graph, the visual features are taken as features of nodes of the constructed graph, and if the two nodes meet all the following conditions:
(1) The distance between the center coordinates of the two nodes is smaller than a preset distance;
(2) The cosine similarity between the features of the two nodes is greater than a cosine similarity threshold;
(3) The cross-over ratio of the two nodes is greater than a cross-over ratio threshold;
connecting the two nodes through edges to obtain an updated track;
a multi-target tracking track generation module: and updating the graph based on the updated track, respectively aggregating the characteristics of the nodes and the characteristics of the edges which are connected in the adjacent past frames and the adjacent future frames of the current frame, and then embedding the characteristics of the nodes and the characteristics of the edges of the current frame as the characteristics of the nodes and the characteristics of the edges of the updated current frame, adding one to the current frame, and entering an image preprocessing module until all frames in the video are processed, so as to obtain the multi-target tracking track.
9. A multi-target tracking device comprising a processor and a memory, wherein the processor implements the multi-target tracking method of any of claims 1-7 when executing a computer program stored in the memory.
10. A multi-target tracking storage medium storing a computer program, wherein the computer program when executed by a processor implements the multi-target tracking method of any of claims 1-7.
CN202410262998.6A 2024-03-08 2024-03-08 Multi-target tracking method, system, equipment and storage medium Active CN117853759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410262998.6A CN117853759B (en) 2024-03-08 2024-03-08 Multi-target tracking method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410262998.6A CN117853759B (en) 2024-03-08 2024-03-08 Multi-target tracking method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117853759A true CN117853759A (en) 2024-04-09
CN117853759B CN117853759B (en) 2024-05-10

Family

ID=90540523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410262998.6A Active CN117853759B (en) 2024-03-08 2024-03-08 Multi-target tracking method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117853759B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118587252A (en) * 2024-07-25 2024-09-03 厦门瑞为信息技术有限公司 Multi-target tracking method, device and storage medium based on appearance feature quality screening

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016034008A1 (en) * 2014-09-04 2016-03-10 华为技术有限公司 Target tracking method and device
CN110782483A (en) * 2019-10-23 2020-02-11 山东大学 Multi-view multi-target tracking method and system based on distributed camera network
WO2020232909A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Pedestrian visual tracking method, model training method and device, apparatus and storage medium
WO2022217840A1 (en) * 2021-04-15 2022-10-20 南京莱斯电子设备有限公司 Method for high-precision multi-target tracking against complex background
CN115359407A (en) * 2022-09-02 2022-11-18 河海大学 A Multi-Vehicle Tracking Method in Video
CN115457082A (en) * 2022-09-01 2022-12-09 湘潭大学 A Pedestrian Multi-target Tracking Algorithm Based on Multi-Feature Fusion Enhancement
CN115861386A (en) * 2022-12-12 2023-03-28 华中科技大学 UAV multi-target tracking method and device through divide-and-conquer association
CN116403139A (en) * 2023-03-24 2023-07-07 国网江苏省电力有限公司电力科学研究院 A Visual Tracking and Localization Method Based on Target Detection
CN116681728A (en) * 2023-06-09 2023-09-01 中南民族大学 Multi-target tracking method and system based on Transformer and graph embedding

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016034008A1 (en) * 2014-09-04 2016-03-10 华为技术有限公司 Target tracking method and device
WO2020232909A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Pedestrian visual tracking method, model training method and device, apparatus and storage medium
CN110782483A (en) * 2019-10-23 2020-02-11 山东大学 Multi-view multi-target tracking method and system based on distributed camera network
WO2022217840A1 (en) * 2021-04-15 2022-10-20 南京莱斯电子设备有限公司 Method for high-precision multi-target tracking against complex background
CN115457082A (en) * 2022-09-01 2022-12-09 湘潭大学 A Pedestrian Multi-target Tracking Algorithm Based on Multi-Feature Fusion Enhancement
CN115359407A (en) * 2022-09-02 2022-11-18 河海大学 A Multi-Vehicle Tracking Method in Video
CN115861386A (en) * 2022-12-12 2023-03-28 华中科技大学 UAV multi-target tracking method and device through divide-and-conquer association
CN116403139A (en) * 2023-03-24 2023-07-07 国网江苏省电力有限公司电力科学研究院 A Visual Tracking and Localization Method Based on Target Detection
CN116681728A (en) * 2023-06-09 2023-09-01 中南民族大学 Multi-target tracking method and system based on Transformer and graph embedding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
任珈民;宫宁生;韩镇阳;: "基于YOLOv3与卡尔曼滤波的多目标跟踪算法", 计算机应用与软件, no. 05, 12 May 2020 (2020-05-12) *
刘玉杰;窦长红;赵其鲁;李宗民;: "基于状态预测和运动结构的在线多目标跟踪", 计算机辅助设计与图形学学报, no. 02, 15 February 2018 (2018-02-15) *
孙志海;朱善安;: "多视频运动对象实时分割及跟踪技术", 浙江大学学报(工学版), no. 09, 15 September 2008 (2008-09-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118587252A (en) * 2024-07-25 2024-09-03 厦门瑞为信息技术有限公司 Multi-target tracking method, device and storage medium based on appearance feature quality screening

Also Published As

Publication number Publication date
CN117853759B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
Kalake et al. Analysis based on recent deep learning approaches applied in real-time multi-object tracking: a review
Zhao et al. Segmentation and tracking of multiple humans in crowded environments
Hu et al. Principal axis-based correspondence between multiple cameras for people tracking
Babaee et al. A dual cnn–rnn for multiple people tracking
CN115995063A (en) Work vehicle detection and tracking method and system
CN103729861B (en) A kind of multi-object tracking method
CN106373146B (en) A Target Tracking Method Based on Fuzzy Learning
CN112132873B (en) A multi-lens pedestrian recognition and tracking method based on computer vision
CN107145862A (en) A Multi-feature Matching Multi-Target Tracking Method Based on Hough Forest
CN118297984B (en) Multi-target tracking method and system for smart city camera
CN115830075A (en) Hierarchical association matching method for pedestrian multi-target tracking
CN117541994A (en) Abnormal behavior detection model and detection method in dense multi-person scene
CN117853759B (en) Multi-target tracking method, system, equipment and storage medium
An et al. Anomalies detection and tracking using Siamese neural networks
CN116958872A (en) Intelligent auxiliary training method and system for badminton
Saleh et al. Artist: Autoregressive trajectory inpainting and scoring for tracking
CN115761568A (en) A Macaque Detection Method Based on YOLOv7 Network and Deepsort Network
Chen et al. Multiperson tracking by online learned grouping model with nonlinear motion context
Santoro et al. Crowd analysis by using optical flow and density based clustering
CN118038341B (en) Multi-target tracking method, device, computer equipment and storage medium
Liu et al. Multi-view vehicle detection and tracking in crossroads
Narayan et al. Learning deep features for online person tracking using non-overlapping cameras: A survey
Gao et al. Beyond group: Multiple person tracking via minimal topology-energy-variation
Bou et al. Reviewing ViBe, a popular background subtraction algorithm for real-time applications
CN113112479A (en) Progressive target detection method and device based on key block extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant