CN118115755B - Multi-target tracking method, system and storage medium - Google Patents
Multi-target tracking method, system and storage medium Download PDFInfo
- Publication number
- CN118115755B CN118115755B CN202410517072.7A CN202410517072A CN118115755B CN 118115755 B CN118115755 B CN 118115755B CN 202410517072 A CN202410517072 A CN 202410517072A CN 118115755 B CN118115755 B CN 118115755B
- Authority
- CN
- China
- Prior art keywords
- target
- lost
- frame
- target frame
- newly
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000011084 recovery Methods 0.000 claims abstract description 55
- 239000011159 matrix material Substances 0.000 claims abstract description 37
- 238000001514 detection method Methods 0.000 claims abstract description 31
- 239000013598 vector Substances 0.000 claims description 45
- 230000003068 static effect Effects 0.000 claims description 22
- 238000007726 management method Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 14
- 230000007774 longterm Effects 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000003062 neural network model Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 6
- 230000005693 optoelectronics Effects 0.000 claims description 6
- 229920001651 Cyanoacrylate Polymers 0.000 claims description 3
- 239000004830 Super Glue Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-target tracking method, a multi-target tracking system and a storage medium, and belongs to the technical field of computer vision tracking. The method comprises the following steps: s1: an image acquisition stage; s2: target detection and camera motion estimation phases; s3: a target motion compensation stage, namely performing motion compensation on the first target frame coordinate pH according to the target frame set and the camera motion matrix M to obtain a second target frame coordinate pC, and then assigning the second target frame coordinate pC back to the target frame; s4: a target association stage of adjacent image frames, wherein a multi-target association tracking method is used for associating target frames of the adjacent image frames; s5: and in the multi-frame association recovery stage, a lost target set and a newly added target set are set, and a multi-target tracking result is obtained by using a lost recovery strategy. The problem that the lost target cannot be retrieved by the existing adjacent frame association method is solved by multi-frame association retrieval, the feature matching space of multi-frame association retrieval is reduced, and quick retrieval of the lost target can be realized no matter in short time or long time.
Description
Technical Field
The present invention relates to the field of computer vision tracking technologies, and in particular, to a multi-target tracking method, system, and storage medium.
Background
The multi-target tracking technology is used for detecting and tracking a plurality of targets in a video, ensures that each target identity is unique, and is widely applied to the fields of monitoring, automatic driving and the like. In recent years, a target detection technology based on deep learning is rapidly developed, and a paradigm of performing target detection and then performing multi-target association becomes a mainstream multi-target tracking implementation scheme. The target detection is mainly responsible for detecting information such as the position, the size and the confidence of a target in an image. The detector of YOLO series, CENTERNET, DETR and the like obtains higher detection precision with real-time performance, and promotes the research of downstream tasks such as target tracking and the like.
The multi-target association is mainly responsible for the data association of adjacent frame tracking targets, and two data association methods exist. The method is a position and motion model method, which predicts the position, speed and other state information of the track in the current frame through a Kalman filtering algorithm based on a uniform model assumption; and calculating the similarity between the detection result and the predicted result IOU, and completing the association of the tracking track and the detection result through a Hungary matching or greedy matching strategy. The other is an appearance model method, which mainly solves the problem of re-matching of a disappeared target caused by long-term shielding, the appearance model introduces the depth model characteristics of the target, and the correlation step is completed by combining the characteristic similarity of the target and a Hungary matching or greedy matching strategy. Representative algorithms mainly include Sort, deepSort, byteSort, bot-Sort based multi-objective tracking algorithms.
However, the complexity of the multi-target tracking scenario presents a significant challenge for target detection as well as multi-target correlation. Under the conditions of occlusion and camera motion, target matching is often inaccurate. The method based on the position and motion model only considers the target association between adjacent frames, and the tracking is lost if the target is shielded or the camera is oversized; however, the method based on the appearance model can realize the re-matching of the target disappearing for a long time, but also can solve the problem that the deep learning model cannot run in real time due to the slow feature extraction speed and large re-matching space.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a multi-target tracking method, a multi-target tracking system and a storage medium.
The aim of the invention is realized by the following technical scheme: the first aspect of the present invention provides: a multi-target tracking method comprising the steps of:
S1: an image acquisition stage of acquiring image data from the optoelectronic device;
s2: in the target detection and camera motion estimation stage, a multi-target detection model is used for carrying out target detection on image data to obtain a target frame set; meanwhile, calculating a camera motion matrix M according to static region feature points of adjacent image frames in the image data;
s3: a target motion compensation stage, namely performing motion compensation on the first target frame coordinate pH according to the target frame set and the camera motion matrix M to obtain a second target frame coordinate pC, and then assigning the second target frame coordinate pC back to the target frame;
s4: a target association stage of adjacent image frames, wherein a multi-target association tracking method is used for associating target frames of the adjacent image frames;
S5: setting a lost target set and a newly added target set in a multi-frame association recovery stage, and obtaining a multi-target tracking result by using a lost recovery strategy;
s6: and a result output stage for outputting the multi-target tracking result by using the result output device.
Preferably, the multi-target detection model is FASTERRCNN model or YOLO model or DETR model; the target frame set comprises a plurality of target frames; the target frame comprises a target frame coordinate, a target frame category, a target frame confidence level and a target frame tracking ID, wherein the target frame coordinate comprises a target frame upper left corner abscissa, a target frame upper left corner ordinate, a target frame lower right corner abscissa and a target frame lower right corner ordinate; and the initial value of the target frame tracking ID is-1.
Preferably, the camera motion matrix M is calculated by the following steps:
Extracting static region feature points, and filtering a target frame with a preset category as a dynamic state; then, matching and associating the static area feature points of the adjacent image frames by using a feature point matching method; then removing noise matching pairs of the dynamic region by using Ransac algorithm to obtain a static region matching pair set; finally, the static region matching pair set is used as input of an Opencv library getAffineTransform () function to obtain a camera motion matrix M.
Preferably, the static region feature points are ORB features or SURF features or SIFT features or SuperPoint features; the characteristic point matching method is violence matching or approximate neighbor matching or SuperGlue matching; the model used by Ransac algorithm is homography matrix H estimation model or essence matrix E estimation model or basic matrix F estimation model; the camera motion matrix M is a matrix of 2 rows and 3 columns.
Preferably, the specific calculation formula of the motion compensation is as follows:
Wherein, pH is the first target frame coordinate, M is the camera motion matrix including six elements a11, a12, a13, a21, a22, a23, pC is the second target frame coordinate, x1 is the first target frame upper left-hand abscissa, y1 is the first target frame upper left-hand ordinate, x2 is the first target frame lower right-hand abscissa, y2 is the first target frame lower right-hand ordinate, xc1 is the second target frame upper left-hand abscissa, yc1 is the second target frame upper left-hand ordinate, xc2 is the second target frame lower right-hand abscissa, yc2 is the second target frame lower right-hand ordinate.
Preferably, the adjacent image frames are t-1 image frames and t image frames; the multi-target association tracking method is a Sort algorithm or DeepSort algorithm or ByteTrack algorithm or BoT-Sort algorithm;
assigning a target frame tracking ID of each target frame in sequence after the target frames of adjacent image frames are associated, assigning tracking IDs of the target frames in sequence from 0 according to the storage sequence of the target frames for the initial image frames, and simultaneously emptying a lost target set of the initial image frames and a newly added target set of the initial image frames;
For the t image frame, if the target frame of the t image frame is already associated with the target frame of the t-1 image frame, assigning a target frame tracking ID of the target frame of the t-1 image frame to the target frame of the t image frame; if the target frame tracking ID of the target frame of the t image frame is not associated, the target frame tracking ID of the target frame of the t image frame is assigned to be the largest target frame tracking ID added with 1, meanwhile, the target frame of the t-1 image frame is added to the lost target set of the t image frame, and the target frame of the t image frame is added to the newly added target set of the t image frame.
Preferably, the loss recovery strategy comprises a short-time loss recovery strategy and a long-time loss recovery strategy; the S5: the multi-frame association recovery stage further comprises: a lost target set management method, a newly added target set management method, a short-time loss judgment method, a short-time loss recovery strategy and a long-time loss recovery strategy;
The lost target set management method comprises a historical lost target coordinate updating stage, a lost target adding stage and a lost target deleting stage; firstly, a historical lost target coordinate updating stage is carried out, each lost target frame coordinate in a lost target set is predicted, and a prediction result is updated to a lost target frame; the prediction comprises the steps of: let the target frame coordinates of any lost target frame in the lost target set at time t be R (tx 1, ty1, tx2, ty 2), and model R as a function of (deltaT, vx1, vy1, alpha, W, H) as follows:
Wherein deltaT is the time difference from the target losing moment in prediction, vx1 is the historical moving speed of the left upper corner abscissa of the lost target frame, vy1 is the historical moving speed of the right lower corner ordinate of the lost target frame, alpha is the expansion coefficient of a rectangular frame with a value of 0-1, W is the width of the image frame where the lost target frame is located, H is the height of the image frame where the lost target frame is located, and the historical moving speed is the losing moment speed or the average speed or the median speed or the maximum speed; then, in a lost target newly-added stage, extracting a lost feature vector from a historical target set of each lost target frame in a t image frame lost target set by using a feature extraction method, and storing the lost feature vectors of all the historical target sets as a historical feature library, wherein the historical target sets are target image areas cached before the lost target frames are lost; then, a t image frame losing target set and a history feature library are newly added to the losing target set; finally, deleting the lost target frame with the time length longer than the maximum existence time length of the lost targets in the lost target set;
The new target set management method comprises a history new target coordinate updating stage, a new target stage and a new target deleting stage; firstly, in a history newly-increased target coordinate updating stage, updating newly-increased target frame coordinates in a newly-increased target set, searching the target frame coordinates of the newly-increased target frame at the latest moment in a target frame set through a target frame tracking ID of the newly-increased target frame, and updating; then, in the new target stage, extracting new feature vectors from each new target frame in the new target set of the t image frames by using a feature extraction method; then, the new added target set and the new added feature vector of the t image frame are newly added to the new added target set; finally, deleting the new target frame with the time length longer than the maximum existing time length of the new target in the new target set;
The short-time loss judging method comprises the following steps: carrying out loss judgment on each lost target frame in the lost target set, marking all lost target frames with the loss time length smaller than the loss time length threshold value as short-time lost target frames, and storing the short-time lost frames into a short-time lost set; recording all lost target frames with the loss time length greater than or equal to the loss time length threshold value as long-time lost target frames and storing the long-time lost target frames into a long-time lost set;
The short-time loss recovery strategy comprises the following steps: firstly judging whether a short-time lost set or a newly-added target set is empty, if so, not carrying out short-time lost target frame recovery, if not, carrying out coordinate matching, predicting the short-time lost target frame to obtain a target prediction frame, carrying out coordinate comparison between the target frame coordinates of the target prediction frame and the target frame coordinates of all newly-added target frames in the newly-added target set, and taking the newly-added target frame completely in the target prediction frame as a candidate newly-added target set; and finally, performing feature matching, namely sequentially performing cosine similarity calculation on the missing feature vector of each short-time missing target frame in the short-time missing set at the missing moment and the newly-added feature vector of each target frame in the candidate newly-added target set, and recovering the short-time missing target frame through matching similarity, wherein the formula is as follows: Wherein b is the target frame with highest similarity in the candidate newly-added target set, bj is the target frame in the candidate newly-added target set, bc is the candidate newly-added target set, xa is the lost feature vector of the short-time lost target frame at the lost moment, xbj is the newly-added feature vector of the target frame in the candidate newly-added target set; if the similarity of b is greater than or equal to the feature matching similarity threshold, the short-time lost target frame is retrieved, the target frame tracking ID of b is set as the target frame tracking ID of the short-time lost target frame, b is deleted from the newly-added target set and the candidate newly-added target set, and the short-time lost target frame is deleted from the lost target set; if the similarity of the b is smaller than the feature matching similarity threshold, the short-time lost target frame is not retrieved;
The long-time loss recovery strategy comprises the following steps: firstly judging whether a long-time lost set or a newly-added target set is empty, if so, not retrieving the long-time lost target frame, if not, performing feature library matching, sequentially performing cosine similarity calculation with newly-added feature vectors of each target frame in the newly-added target set by using a history feature library of the long-time lost target frame at the time of loss, and retrieving the short-time lost target frame through the matching similarity, wherein the formula is expressed as follows: wherein bl is the target frame with highest similarity in the newly-added target set, bk is the target frame in the newly-added target set, B is the newly-added target set, al is the target frame in the long-time lost set, xal is the first lost feature vector in the history feature library of the long-time lost target frame at the lost moment, and xbk is the newly-added feature vector of the target frame in the newly-added target set; if the similarity of the bl is greater than or equal to a feature matching similarity threshold, retrieving the long-term lost target frame, setting the target frame tracking ID of the bl as the target frame tracking ID of the long-term lost target frame, deleting the bl from the newly added target set, and deleting the long-term lost target frame from the lost target set; if the similarity of bl is smaller than the feature matching similarity threshold, the target frame is lost for a long time and is not retrieved.
Preferably, the feature extraction method is a HOG method, a word bag Bow method, a VGG neural network model, a ResNet neural network model, or a NetVLAD neural network model.
A second aspect of the invention provides: a multi-target tracking system for implementing any of the multi-target tracking methods described above, comprising:
The photoelectric device is used for carrying a photoelectric camera and transmitting the image data to the computing device in a wired transmission mode or a wireless transmission mode;
The computing device is used for carrying a computing force unit and a multi-target tracking module, wherein the multi-target tracking module is used for realizing multi-target tracking of the image data to obtain a multi-target tracking result and outputting the multi-target tracking result to the result output device;
The result output device is used for outputting a multi-target tracking result in a wired transmission mode or a wireless transmission mode;
the multi-target tracking module includes: an image acquisition unit for acquiring image data from the optoelectronic device;
the target detection and camera motion estimation unit is used for carrying out target detection on the image data by using the multi-target detection model to obtain a target frame set; meanwhile, calculating a camera motion matrix M according to static region feature points of adjacent image frames in the image data;
The target motion compensation unit is used for performing motion compensation on the first target frame coordinate pH according to the target frame set and the camera motion matrix M to obtain a second target frame coordinate pC, and then assigning the second target frame coordinate pC back to the target frame;
The adjacent image frame target association unit is used for associating target frames of the adjacent image frames by using a multi-target association tracking method;
And the multi-frame association recovery unit is used for setting a lost target set and a newly added target set, and obtaining a multi-target tracking result by using a lost recovery strategy.
A third aspect of the invention provides: a computer storage medium having stored therein computer executable instructions that when loaded and executed by a processor implement any of the multi-objective tracking methods described above.
The beneficial effects of the invention are as follows:
1) The problem that the lost target cannot be retrieved by the existing adjacent frame association method is solved by multi-frame association retrieval; meanwhile, through the designed maintenance strategies of the lost target set and the newly added target set, the feature matching space of multi-frame association retrieval is reduced, and quick retrieval of the lost target can be realized no matter in short time or long time.
2) Correcting the image target error offset caused by camera motion through target motion compensation; at the same time, having the object motion compensation act after the object detection, rather than at the adjacent image frame object association stage, has the advantage that it can be combined with existing adjacent frame object association methods without any modification.
3) The target detection and the camera motion estimation are completely independent, and the calculation time is saved and mutually unrestricted by parallel calculation.
Drawings
FIG. 1 is a flow chart of a multi-objective tracking method;
FIG. 2 is a block diagram of a multi-target tracking system;
FIG. 3 is a multi-frame association recovery flow chart.
Detailed Description
The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.
In the present invention, MOT: multi-Object-Tracking, meaning Multi-Object Tracking; kalman filtering: the Kalman filter algorithm can realize the prediction of the image target position; ransac algorithm: the abbreviation of a random sampling coincidence algorithm is an algorithm for calculating mathematical model parameters of data according to a group of sample data sets containing abnormal data to obtain effective sample data; opencv library: a library of generic algorithms for image processing and computer vision.
Referring to fig. 1-3, a first aspect of the present invention provides: a multi-target tracking method comprising the steps of:
S1: an image acquisition stage of acquiring image data from the optoelectronic device;
s2: in the target detection and camera motion estimation stage, a multi-target detection model is used for carrying out target detection on image data to obtain a target frame set; meanwhile, calculating a camera motion matrix M according to static region feature points of adjacent image frames in the image data;
s3: a target motion compensation stage, namely performing motion compensation on the first target frame coordinate pH according to the target frame set and the camera motion matrix M to obtain a second target frame coordinate pC, and then assigning the second target frame coordinate pC back to the target frame;
s4: a target association stage of adjacent image frames, wherein a multi-target association tracking method is used for associating target frames of the adjacent image frames;
S5: setting a lost target set and a newly added target set in a multi-frame association recovery stage, and obtaining a multi-target tracking result by using a lost recovery strategy;
s6: and a result output stage for outputting the multi-target tracking result by using the result output device.
In some embodiments, the multi-target detection model is FASTERRCNN model or YOLO model or DETR model; the target frame set comprises a plurality of target frames; the target frame comprises a target frame coordinate, a target frame category, a target frame confidence level and a target frame tracking ID, wherein the target frame coordinate comprises a target frame upper left corner abscissa, a target frame upper left corner ordinate, a target frame lower right corner abscissa and a target frame lower right corner ordinate; and the initial value of the target frame tracking ID is-1.
In some embodiments, the camera motion matrix M is calculated by:
Extracting static region feature points, and filtering a target frame with a preset category as a dynamic state; then, matching and associating the static area feature points of the adjacent image frames by using a feature point matching method; then removing noise matching pairs of the dynamic region by using Ransac algorithm to obtain a static region matching pair set; finally, the static region matching pair set is used as input of an Opencv library getAffineTransform () function to obtain a camera motion matrix M.
In this embodiment, the preset category is set manually in advance, for example, a vehicle, a person and the like are dynamic categories, and a house, a bridge and the like are static categories; filtering the dynamic target box may reduce interference of the dynamic target.
In some embodiments, the static region feature points are ORB features or SURF features or SIFT features or SuperPoint features; the characteristic point matching method is violence matching or approximate neighbor matching or SuperGlue matching; the model used by Ransac algorithm is homography matrix H estimation model or essence matrix E estimation model or basic matrix F estimation model; the camera motion matrix M is a matrix of 2 rows and 3 columns.
In some embodiments, the motion compensation is specifically calculated as follows:
Wherein, pH is the first target frame coordinate, M is the camera motion matrix including six elements a11, a12, a13, a21, a22, a23, pC is the second target frame coordinate, x1 is the first target frame upper left-hand abscissa, y1 is the first target frame upper left-hand ordinate, x2 is the first target frame lower right-hand abscissa, y2 is the first target frame lower right-hand ordinate, xc1 is the second target frame upper left-hand abscissa, yc1 is the second target frame upper left-hand ordinate, xc2 is the second target frame lower right-hand abscissa, yc2 is the second target frame lower right-hand ordinate.
In some embodiments, the adjacent image frames are t-1 image frames and t image frames; the multi-target association tracking method is a Sort algorithm or DeepSort algorithm or ByteTrack algorithm or BoT-Sort algorithm;
assigning a target frame tracking ID of each target frame in sequence after the target frames of adjacent image frames are associated, assigning tracking IDs of the target frames in sequence from 0 according to the storage sequence of the target frames for the initial image frames, and simultaneously emptying a lost target set of the initial image frames and a newly added target set of the initial image frames;
For the t image frame, if the target frame of the t image frame is already associated with the target frame of the t-1 image frame, assigning a target frame tracking ID of the target frame of the t-1 image frame to the target frame of the t image frame; if the target frame tracking ID of the target frame of the t image frame is not associated, the target frame tracking ID of the target frame of the t image frame is assigned to be the largest target frame tracking ID added with 1, meanwhile, the target frame of the t-1 image frame is added to the lost target set of the t image frame, and the target frame of the t image frame is added to the newly added target set of the t image frame.
In some embodiments, the loss recovery strategy includes a short-time loss recovery strategy and a long-time loss recovery strategy; the S5: the multi-frame association recovery stage further comprises: a lost target set management method, a newly added target set management method, a short-time loss judgment method, a short-time loss recovery strategy and a long-time loss recovery strategy;
The lost target set management method comprises a historical lost target coordinate updating stage, a lost target adding stage and a lost target deleting stage; firstly, a historical lost target coordinate updating stage is carried out, each lost target frame coordinate in a lost target set is predicted, and a prediction result is updated to a lost target frame; the prediction comprises the steps of: let the target frame coordinates of any lost target frame in the lost target set at time t be R (tx 1, ty1, tx2, ty 2), and model R as a function of (deltaT, vx1, vy1, alpha, W, H) as follows:
Wherein deltaT is the time difference from the target losing moment in prediction, vx1 is the historical moving speed of the left upper corner abscissa of the lost target frame, vy1 is the historical moving speed of the right lower corner ordinate of the lost target frame, alpha is the expansion coefficient of a rectangular frame with a value of 0-1, W is the width of the image frame where the lost target frame is located, H is the height of the image frame where the lost target frame is located, and the historical moving speed is the losing moment speed or the average speed or the median speed or the maximum speed; then, in a lost target newly-added stage, extracting a lost feature vector from a historical target set of each lost target frame in a t image frame lost target set by using a feature extraction method, and storing the lost feature vectors of all the historical target sets as a historical feature library, wherein the historical target sets are target image areas cached before the lost target frames are lost; then, a t image frame losing target set and a history feature library are newly added to the losing target set; finally, deleting the lost target frame with the time length longer than the maximum existence time length of the lost targets in the lost target set;
The new target set management method comprises a history new target coordinate updating stage, a new target stage and a new target deleting stage; firstly, in a history newly-increased target coordinate updating stage, updating newly-increased target frame coordinates in a newly-increased target set, searching the target frame coordinates of the newly-increased target frame at the latest moment in a target frame set through a target frame tracking ID of the newly-increased target frame, and updating; then, in the new target stage, extracting new feature vectors from each new target frame in the new target set of the t image frames by using a feature extraction method; then, the new added target set and the new added feature vector of the t image frame are newly added to the new added target set; finally, deleting the new target frame with the time length longer than the maximum existing time length of the new target in the new target set;
The short-time loss judging method comprises the following steps: carrying out loss judgment on each lost target frame in the lost target set, marking all lost target frames with the loss time length smaller than the loss time length threshold value as short-time lost target frames, and storing the short-time lost frames into a short-time lost set; recording all lost target frames with the loss time length greater than or equal to the loss time length threshold value as long-time lost target frames and storing the long-time lost target frames into a long-time lost set;
The short-time loss recovery strategy comprises the following steps: firstly judging whether a short-time lost set or a newly-added target set is empty, if so, not carrying out short-time lost target frame recovery, if not, carrying out coordinate matching, predicting the short-time lost target frame to obtain a target prediction frame, carrying out coordinate comparison between the target frame coordinates of the target prediction frame and the target frame coordinates of all newly-added target frames in the newly-added target set, and taking the newly-added target frame completely in the target prediction frame as a candidate newly-added target set; and finally, performing feature matching, namely sequentially performing cosine similarity calculation on the missing feature vector of each short-time missing target frame in the short-time missing set at the missing moment and the newly-added feature vector of each target frame in the candidate newly-added target set, and recovering the short-time missing target frame through matching similarity, wherein the formula is as follows: Wherein b is the target frame with highest similarity in the candidate newly-added target set, bj is the target frame in the candidate newly-added target set, bc is the candidate newly-added target set, xa is the lost feature vector of the short-time lost target frame at the lost moment, xbj is the newly-added feature vector of the target frame in the candidate newly-added target set; if the similarity of b is greater than or equal to the feature matching similarity threshold, the short-time lost target frame is retrieved, the target frame tracking ID of b is set as the target frame tracking ID of the short-time lost target frame, b is deleted from the newly-added target set and the candidate newly-added target set, and the short-time lost target frame is deleted from the lost target set; if the similarity of the b is smaller than the feature matching similarity threshold, the short-time lost target frame is not retrieved;
The long-time loss recovery strategy comprises the following steps: firstly judging whether a long-time lost set or a newly-added target set is empty, if so, not retrieving the long-time lost target frame, if not, performing feature library matching, sequentially performing cosine similarity calculation with newly-added feature vectors of each target frame in the newly-added target set by using a history feature library of the long-time lost target frame at the time of loss, and retrieving the short-time lost target frame through the matching similarity, wherein the formula is expressed as follows: wherein bl is the target frame with highest similarity in the newly-added target set, bk is the target frame in the newly-added target set, B is the newly-added target set, al is the target frame in the long-time lost set, xal is the first lost feature vector in the history feature library of the long-time lost target frame at the lost moment, and xbk is the newly-added feature vector of the target frame in the newly-added target set; if the similarity of the bl is greater than or equal to a feature matching similarity threshold, retrieving the long-term lost target frame, setting the target frame tracking ID of the bl as the target frame tracking ID of the long-term lost target frame, deleting the bl from the newly added target set, and deleting the long-term lost target frame from the lost target set; if the similarity of bl is smaller than the feature matching similarity threshold, the target frame is lost for a long time and is not retrieved.
In the embodiment, the lost target recovery matching space can be greatly reduced by maintaining the lost target set and the newly added target set B; the short-time loss recovery strategy assumes that the appearance, shape and size of the short-time target are slightly changed, and the short-time multi-frame information is fully utilized to combine the predicted coordinates of the target frame and the target characteristics at the time of loss to carry out matching recovery. According to the long-time lost recovery strategy, the feature library is built by extracting features of historical targets due to large changes of the appearance, the shape and the size of the targets, lost targets are recovered based on feature library matching, and meanwhile, the feature matching space can be controlled in a smaller range by maintaining the sizes of lost targets and newly-increased targets. The feature extraction method may be any image feature extraction method, and in this embodiment, a VGG16 model is used to extract feature vectors of the target.
In some embodiments, the feature extraction method is a HOG method, a bag of words (bag of word) method, a VGG neural network model, resNet neural network model, netVLAD neural network model.
A second aspect of the invention provides: a multi-target tracking system for implementing any of the multi-target tracking methods described above, comprising:
The photoelectric device is used for carrying a photoelectric camera and transmitting the image data to the computing device in a wired transmission mode or a wireless transmission mode;
The computing device is used for carrying a computing force unit and a multi-target tracking module, wherein the multi-target tracking module is used for realizing multi-target tracking of the image data to obtain a multi-target tracking result and outputting the multi-target tracking result to the result output device;
The result output device is used for outputting a multi-target tracking result in a wired transmission mode or a wireless transmission mode;
the multi-target tracking module includes: an image acquisition unit for acquiring image data from the optoelectronic device;
the target detection and camera motion estimation unit is used for carrying out target detection on the image data by using the multi-target detection model to obtain a target frame set; meanwhile, calculating a camera motion matrix M according to static region feature points of adjacent image frames in the image data;
The target motion compensation unit is used for performing motion compensation on the first target frame coordinate pH according to the target frame set and the camera motion matrix M to obtain a second target frame coordinate pC, and then assigning the second target frame coordinate pC back to the target frame;
The adjacent image frame target association unit is used for associating target frames of the adjacent image frames by using a multi-target association tracking method;
And the multi-frame association recovery unit is used for setting a lost target set and a newly added target set, and obtaining a multi-target tracking result by using a lost recovery strategy.
A third aspect of the invention provides: a computer storage medium having stored therein computer executable instructions that when loaded and executed by a processor implement any of the multi-objective tracking methods described above.
The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.
Claims (9)
1. A multi-target tracking method is characterized in that: the method comprises the following steps:
S1: an image acquisition stage of acquiring image data from the optoelectronic device;
s2: in the target detection and camera motion estimation stage, a multi-target detection model is used for carrying out target detection on image data to obtain a target frame set; meanwhile, calculating a camera motion matrix M according to static region feature points of adjacent image frames in the image data;
s3: a target motion compensation stage, namely performing motion compensation on the first target frame coordinate pH according to the target frame set and the camera motion matrix M to obtain a second target frame coordinate pC, and then assigning the second target frame coordinate pC back to the target frame;
s4: a target association stage of adjacent image frames, wherein a multi-target association tracking method is used for associating target frames of the adjacent image frames;
S5: setting a lost target set and a newly added target set in a multi-frame association recovery stage, and obtaining a multi-target tracking result by using a lost recovery strategy;
s6: a result output stage for outputting a multi-target tracking result by using a result output device;
The loss recovery strategy comprises a short-time loss recovery strategy and a long-time loss recovery strategy; the S5: the multi-frame association recovery stage further comprises: a lost target set management method, a newly added target set management method, a short-time loss judgment method, a short-time loss recovery strategy and a long-time loss recovery strategy;
The lost target set management method comprises a historical lost target coordinate updating stage, a lost target adding stage and a lost target deleting stage; firstly, a historical lost target coordinate updating stage is carried out, each lost target frame coordinate in a lost target set is predicted, and a prediction result is updated to a lost target frame; the prediction comprises the steps of: let the target frame coordinates of any lost target frame in the lost target set at time t be R (tx 1, ty1, tx2, ty 2), and model R as a function of (deltaT, vx1, vy1, alpha, W, H) as follows:
Wherein deltaT is the time difference from the target losing moment in prediction, vx1 is the historical moving speed of the left upper corner abscissa of the lost target frame, vy1 is the historical moving speed of the right lower corner ordinate of the lost target frame, alpha is the expansion coefficient of a rectangular frame with a value of 0-1, W is the width of the image frame where the lost target frame is located, H is the height of the image frame where the lost target frame is located, and the historical moving speed is the losing moment speed or the average speed or the median speed or the maximum speed; then, in a lost target newly-added stage, extracting a lost feature vector from a historical target set of each lost target frame in a t image frame lost target set by using a feature extraction method, and storing the lost feature vectors of all the historical target sets as a historical feature library, wherein the historical target sets are target image areas cached before the lost target frames are lost; then, a t image frame losing target set and a history feature library are newly added to the losing target set; finally, deleting the lost target frame with the time length longer than the maximum existence time length of the lost targets in the lost target set;
The new target set management method comprises a history new target coordinate updating stage, a new target stage and a new target deleting stage; firstly, in a history newly-increased target coordinate updating stage, updating newly-increased target frame coordinates in a newly-increased target set, searching the target frame coordinates of the newly-increased target frame at the latest moment in a target frame set through a target frame tracking ID of the newly-increased target frame, and updating; then, in the new target stage, extracting new feature vectors from each new target frame in the new target set of the t image frames by using a feature extraction method; then, the new added target set and the new added feature vector of the t image frame are newly added to the new added target set; finally, deleting the new target frame with the time length longer than the maximum existing time length of the new target in the new target set;
The short-time loss judging method comprises the following steps: carrying out loss judgment on each lost target frame in the lost target set, marking all lost target frames with the loss time length smaller than the loss time length threshold value as short-time lost target frames, and storing the short-time lost frames into a short-time lost set; recording all lost target frames with the loss time length greater than or equal to the loss time length threshold value as long-time lost target frames and storing the long-time lost target frames into a long-time lost set;
The short-time loss recovery strategy comprises the following steps: firstly judging whether a short-time lost set or a newly-added target set is empty, if so, not carrying out short-time lost target frame recovery, if not, carrying out coordinate matching, predicting the short-time lost target frame to obtain a target prediction frame, carrying out coordinate comparison between the target frame coordinates of the target prediction frame and the target frame coordinates of all newly-added target frames in the newly-added target set, and taking the newly-added target frame completely in the target prediction frame as a candidate newly-added target set; and finally, performing feature matching, namely sequentially performing cosine similarity calculation on the missing feature vector of each short-time missing target frame in the short-time missing set at the missing moment and the newly-added feature vector of each target frame in the candidate newly-added target set, and recovering the short-time missing target frame through matching similarity, wherein the formula is as follows: Wherein b is the target frame with highest similarity in the candidate newly-added target set, bj is the target frame in the candidate newly-added target set, bc is the candidate newly-added target set, xa is the lost feature vector of the short-time lost target frame at the lost moment, xbj is the newly-added feature vector of the target frame in the candidate newly-added target set; if the similarity of b is greater than or equal to the feature matching similarity threshold, the short-time lost target frame is retrieved, the target frame tracking ID of b is set as the target frame tracking ID of the short-time lost target frame, b is deleted from the newly-added target set and the candidate newly-added target set, and the short-time lost target frame is deleted from the lost target set; if the similarity of the b is smaller than the feature matching similarity threshold, the short-time lost target frame is not retrieved;
The long-time loss recovery strategy comprises the following steps: firstly judging whether a long-time lost set or a newly-added target set is empty, if so, not retrieving the long-time lost target frame, if not, performing feature library matching, sequentially performing cosine similarity calculation with newly-added feature vectors of each target frame in the newly-added target set by using a history feature library of the long-time lost target frame at the time of loss, and retrieving the short-time lost target frame through the matching similarity, wherein the formula is expressed as follows: wherein bl is the target frame with highest similarity in the newly-added target set, bk is the target frame in the newly-added target set, B is the newly-added target set, al is the target frame in the long-time lost set, xal is the first lost feature vector in the history feature library of the long-time lost target frame at the lost moment, and xbk is the newly-added feature vector of the target frame in the newly-added target set; if the similarity of the bl is greater than or equal to a feature matching similarity threshold, retrieving the long-term lost target frame, setting the target frame tracking ID of the bl as the target frame tracking ID of the long-term lost target frame, deleting the bl from the newly added target set, and deleting the long-term lost target frame from the lost target set; if the similarity of bl is smaller than the feature matching similarity threshold, the target frame is lost for a long time and is not retrieved.
2. The multi-target tracking method of claim 1, wherein: the multi-target detection model is FASTERRCNN model or YOLO model or DETR model; the target frame set comprises a plurality of target frames; the target frame comprises a target frame coordinate, a target frame category, a target frame confidence level and a target frame tracking ID, wherein the target frame coordinate comprises a target frame upper left corner abscissa, a target frame upper left corner ordinate, a target frame lower right corner abscissa and a target frame lower right corner ordinate; and the initial value of the target frame tracking ID is-1.
3. The multi-target tracking method of claim 1, wherein: the camera motion matrix M is calculated by the following steps:
Extracting static region feature points, and filtering a target frame with a preset category as a dynamic state; then, matching and associating the static area feature points of the adjacent image frames by using a feature point matching method; then removing noise matching pairs of the dynamic region by using Ransac algorithm to obtain a static region matching pair set; finally, the static region matching pair set is used as input of an Opencv library getAffineTransform () function to obtain a camera motion matrix M.
4. A multi-target tracking method according to claim 3, characterized in that: the static region feature points are ORB features or SURF features or SIFT features or SuperPoint features; the characteristic point matching method is violence matching or approximate neighbor matching or SuperGlue matching; the model used by Ransac algorithm is homography matrix H estimation model or essence matrix E estimation model or basic matrix F estimation model; the camera motion matrix M is a matrix of 2 rows and 3 columns.
5. The multi-target tracking method of claim 1, wherein: the motion compensation concrete calculation formula is as follows:
Wherein, pH is the first target frame coordinate, M is the camera motion matrix including six elements a11, a12, a13, a21, a22, a23, pC is the second target frame coordinate, x1 is the first target frame upper left-hand abscissa, y1 is the first target frame upper left-hand ordinate, x2 is the first target frame lower right-hand abscissa, y2 is the first target frame lower right-hand ordinate, xc1 is the second target frame upper left-hand abscissa, yc1 is the second target frame upper left-hand ordinate, xc2 is the second target frame lower right-hand abscissa, yc2 is the second target frame lower right-hand ordinate.
6. The multi-target tracking method of claim 1, wherein: the adjacent image frames are t-1 image frames and t image frames; the multi-target association tracking method is a Sort algorithm or DeepSort algorithm or ByteTrack algorithm or BoT-Sort algorithm;
assigning a target frame tracking ID of each target frame in sequence after the target frames of adjacent image frames are associated, assigning tracking IDs of the target frames in sequence from 0 according to the storage sequence of the target frames for the initial image frames, and simultaneously emptying a lost target set of the initial image frames and a newly added target set of the initial image frames;
For the t image frame, if the target frame of the t image frame is already associated with the target frame of the t-1 image frame, assigning a target frame tracking ID of the target frame of the t-1 image frame to the target frame of the t image frame; if the target frame tracking ID of the target frame of the t image frame is not associated, the target frame tracking ID of the target frame of the t image frame is assigned to be the largest target frame tracking ID added with 1, meanwhile, the target frame of the t-1 image frame is added to the lost target set of the t image frame, and the target frame of the t image frame is added to the newly added target set of the t image frame.
7. The multi-target tracking method of claim 1, wherein: the feature extraction method is a HOG method, a word bag Bow method, a VGG neural network model, a ResNet neural network model and a NetVLAD neural network model.
8. A multi-target tracking system, characterized by: a method for implementing a multi-target tracking method as claimed in any one of claims 1 to 7, comprising:
The photoelectric device is used for carrying a photoelectric camera and transmitting the image data to the computing device in a wired transmission mode or a wireless transmission mode;
The computing device is used for carrying a computing force unit and a multi-target tracking module, wherein the multi-target tracking module is used for realizing multi-target tracking of the image data to obtain a multi-target tracking result and outputting the multi-target tracking result to the result output device;
The result output device is used for outputting a multi-target tracking result in a wired transmission mode or a wireless transmission mode;
the multi-target tracking module includes: an image acquisition unit for acquiring image data from the optoelectronic device;
the target detection and camera motion estimation unit is used for carrying out target detection on the image data by using the multi-target detection model to obtain a target frame set; meanwhile, calculating a camera motion matrix M according to static region feature points of adjacent image frames in the image data;
The target motion compensation unit is used for performing motion compensation on the first target frame coordinate pH according to the target frame set and the camera motion matrix M to obtain a second target frame coordinate pC, and then assigning the second target frame coordinate pC back to the target frame;
The adjacent image frame target association unit is used for associating target frames of the adjacent image frames by using a multi-target association tracking method;
The multi-frame association recovery unit is used for setting a lost target set and a newly added target set, and obtaining a multi-target tracking result by using a lost recovery strategy;
The loss recovery strategy comprises a short-time loss recovery strategy and a long-time loss recovery strategy; the S5: the multi-frame association recovery stage further comprises: a lost target set management method, a newly added target set management method, a short-time loss judgment method, a short-time loss recovery strategy and a long-time loss recovery strategy;
The lost target set management method comprises a historical lost target coordinate updating stage, a lost target adding stage and a lost target deleting stage; firstly, a historical lost target coordinate updating stage is carried out, each lost target frame coordinate in a lost target set is predicted, and a prediction result is updated to a lost target frame; the prediction comprises the steps of: let the target frame coordinates of any lost target frame in the lost target set at time t be R (tx 1, ty1, tx2, ty 2), and model R as a function of (deltaT, vx1, vy1, alpha, W, H) as follows:
Wherein deltaT is the time difference from the target losing moment in prediction, vx1 is the historical moving speed of the left upper corner abscissa of the lost target frame, vy1 is the historical moving speed of the right lower corner ordinate of the lost target frame, alpha is the expansion coefficient of a rectangular frame with a value of 0-1, W is the width of the image frame where the lost target frame is located, H is the height of the image frame where the lost target frame is located, and the historical moving speed is the losing moment speed or the average speed or the median speed or the maximum speed; then, in a lost target newly-added stage, extracting a lost feature vector from a historical target set of each lost target frame in a t image frame lost target set by using a feature extraction method, and storing the lost feature vectors of all the historical target sets as a historical feature library, wherein the historical target sets are target image areas cached before the lost target frames are lost; then, a t image frame losing target set and a history feature library are newly added to the losing target set; finally, deleting the lost target frame with the time length longer than the maximum existence time length of the lost targets in the lost target set;
The new target set management method comprises a history new target coordinate updating stage, a new target stage and a new target deleting stage; firstly, in a history newly-increased target coordinate updating stage, updating newly-increased target frame coordinates in a newly-increased target set, searching the target frame coordinates of the newly-increased target frame at the latest moment in a target frame set through a target frame tracking ID of the newly-increased target frame, and updating; then, in the new target stage, extracting new feature vectors from each new target frame in the new target set of the t image frames by using a feature extraction method; then, the new added target set and the new added feature vector of the t image frame are newly added to the new added target set; finally, deleting the new target frame with the time length longer than the maximum existing time length of the new target in the new target set;
The short-time loss judging method comprises the following steps: carrying out loss judgment on each lost target frame in the lost target set, marking all lost target frames with the loss time length smaller than the loss time length threshold value as short-time lost target frames, and storing the short-time lost frames into a short-time lost set; recording all lost target frames with the loss time length greater than or equal to the loss time length threshold value as long-time lost target frames and storing the long-time lost target frames into a long-time lost set;
The short-time loss recovery strategy comprises the following steps: firstly judging whether a short-time lost set or a newly-added target set is empty, if so, not carrying out short-time lost target frame recovery, if not, carrying out coordinate matching, predicting the short-time lost target frame to obtain a target prediction frame, carrying out coordinate comparison between the target frame coordinates of the target prediction frame and the target frame coordinates of all newly-added target frames in the newly-added target set, and taking the newly-added target frame completely in the target prediction frame as a candidate newly-added target set; and finally, performing feature matching, namely sequentially performing cosine similarity calculation on the missing feature vector of each short-time missing target frame in the short-time missing set at the missing moment and the newly-added feature vector of each target frame in the candidate newly-added target set, and recovering the short-time missing target frame through matching similarity, wherein the formula is as follows: Wherein b is the target frame with highest similarity in the candidate newly-added target set, bj is the target frame in the candidate newly-added target set, bc is the candidate newly-added target set, xa is the lost feature vector of the short-time lost target frame at the lost moment, xbj is the newly-added feature vector of the target frame in the candidate newly-added target set; if the similarity of b is greater than or equal to the feature matching similarity threshold, the short-time lost target frame is retrieved, the target frame tracking ID of b is set as the target frame tracking ID of the short-time lost target frame, b is deleted from the newly-added target set and the candidate newly-added target set, and the short-time lost target frame is deleted from the lost target set; if the similarity of the b is smaller than the feature matching similarity threshold, the short-time lost target frame is not retrieved;
The long-time loss recovery strategy comprises the following steps: firstly judging whether a long-time lost set or a newly-added target set is empty, if so, not retrieving the long-time lost target frame, if not, performing feature library matching, sequentially performing cosine similarity calculation with newly-added feature vectors of each target frame in the newly-added target set by using a history feature library of the long-time lost target frame at the time of loss, and retrieving the short-time lost target frame through the matching similarity, wherein the formula is expressed as follows: wherein bl is the target frame with highest similarity in the newly-added target set, bk is the target frame in the newly-added target set, B is the newly-added target set, al is the target frame in the long-time lost set, xal is the first lost feature vector in the history feature library of the long-time lost target frame at the lost moment, and xbk is the newly-added feature vector of the target frame in the newly-added target set; if the similarity of the bl is greater than or equal to a feature matching similarity threshold, retrieving the long-term lost target frame, setting the target frame tracking ID of the bl as the target frame tracking ID of the long-term lost target frame, deleting the bl from the newly added target set, and deleting the long-term lost target frame from the lost target set; if the similarity of bl is smaller than the feature matching similarity threshold, the target frame is lost for a long time and is not retrieved.
9. A computer storage medium, characterized by: the computer readable storage medium has stored therein computer executable instructions which, when loaded and executed by a processor, implement the multi-objective tracking method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410517072.7A CN118115755B (en) | 2024-04-28 | 2024-04-28 | Multi-target tracking method, system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410517072.7A CN118115755B (en) | 2024-04-28 | 2024-04-28 | Multi-target tracking method, system and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118115755A CN118115755A (en) | 2024-05-31 |
CN118115755B true CN118115755B (en) | 2024-06-28 |
Family
ID=91217367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410517072.7A Active CN118115755B (en) | 2024-04-28 | 2024-04-28 | Multi-target tracking method, system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118115755B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612818A (en) * | 2020-05-07 | 2020-09-01 | 江苏新通达电子科技股份有限公司 | Novel binocular vision multi-target tracking method and system |
CN116433728A (en) * | 2023-03-27 | 2023-07-14 | 淮阴工学院 | DeepSORT target tracking method for shake blur scene |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11144761B2 (en) * | 2016-04-04 | 2021-10-12 | Xerox Corporation | Deep data association for online multi-class multi-object tracking |
CN114782484A (en) * | 2022-04-06 | 2022-07-22 | 上海交通大学 | Multi-target tracking method and system for detection loss and association failure |
CN116309731A (en) * | 2023-03-09 | 2023-06-23 | 江苏大学 | Multi-target dynamic tracking method based on self-adaptive Kalman filtering |
-
2024
- 2024-04-28 CN CN202410517072.7A patent/CN118115755B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612818A (en) * | 2020-05-07 | 2020-09-01 | 江苏新通达电子科技股份有限公司 | Novel binocular vision multi-target tracking method and system |
CN116433728A (en) * | 2023-03-27 | 2023-07-14 | 淮阴工学院 | DeepSORT target tracking method for shake blur scene |
Also Published As
Publication number | Publication date |
---|---|
CN118115755A (en) | 2024-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108960211B (en) | Multi-target human body posture detection method and system | |
CN112288770A (en) | Video real-time multi-target detection and tracking method and device based on deep learning | |
CN110120064B (en) | Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning | |
CN110751674A (en) | Multi-target tracking method and corresponding video analysis system | |
CN111104925B (en) | Image processing method, image processing apparatus, storage medium, and electronic device | |
CN110909591A (en) | Self-adaptive non-maximum value inhibition processing method for pedestrian image detection by using coding vector | |
CN111079604A (en) | Method for quickly detecting tiny target facing large-scale remote sensing image | |
CN110992378B (en) | Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot | |
CN114694261A (en) | Video three-dimensional human body posture estimation method and system based on multi-level supervision graph convolution | |
CN113139416A (en) | Object association method, computer device, and storage medium | |
CN111950370A (en) | Dynamic environment offline visual milemeter expansion method | |
CN118115755B (en) | Multi-target tracking method, system and storage medium | |
CN113256683B (en) | Target tracking method and related equipment | |
CN103413326A (en) | Method and device for detecting feature points in Fast approximated SIFT algorithm | |
CN113963333A (en) | Traffic sign board detection method based on improved YOLOF model | |
CN116993779B (en) | Vehicle target tracking method suitable for monitoring video | |
CN111666822A (en) | Low-altitude unmanned aerial vehicle target detection method and system based on deep learning | |
CN113721240B (en) | Target association method, device, electronic equipment and storage medium | |
CN114972956A (en) | Target detection model training method, device, equipment and storage medium | |
CN111242980B (en) | Point target-oriented infrared focal plane blind pixel dynamic detection method | |
Vasekar et al. | A method based on background subtraction and Kalman Filter algorithm for object tracking | |
CN115131420B (en) | Visual SLAM method and device based on key frame optimization | |
CN117011766B (en) | Artificial intelligence detection method and system based on intra-frame differentiation | |
CN111738063B (en) | Ship target tracking method, system, computer equipment and storage medium | |
CN112396593B (en) | Closed loop detection method based on key frame selection and local features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |