[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115457079A - Tracking method for target person - Google Patents

Tracking method for target person Download PDF

Info

Publication number
CN115457079A
CN115457079A CN202211008836.7A CN202211008836A CN115457079A CN 115457079 A CN115457079 A CN 115457079A CN 202211008836 A CN202211008836 A CN 202211008836A CN 115457079 A CN115457079 A CN 115457079A
Authority
CN
China
Prior art keywords
tracking
target
data set
tracker
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211008836.7A
Other languages
Chinese (zh)
Inventor
张丽
林必贵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Suwen Jiuzhou Medical Technology Co ltd
Original Assignee
Changzhou Suwen Jiuzhou Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Suwen Jiuzhou Medical Technology Co ltd filed Critical Changzhou Suwen Jiuzhou Medical Technology Co ltd
Priority to CN202211008836.7A priority Critical patent/CN115457079A/en
Publication of CN115457079A publication Critical patent/CN115457079A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/247Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a tracking method for a target person, which belongs to the technical field of visual target tracking and comprises the following steps: s1, establishing a motion model: the motion model mainly models and estimates the motion track of the target in the video, the speed and the quality of the generated candidate sample directly determine the performance quality of the tracking system, and S2, feature extraction: after determining the target search range of the current frame according to the motion model, then, feature extraction needs to be carried out on the candidate area or the candidate sample, and S3, the observation model: according to different observation models, a tracking algorithm can be roughly divided into a generative tracker and a discriminant tracker, the generative tracker only uses target information of a foreground to construct a tracking model, and an optimal sample is selected by measuring reconstruction errors or similarity of candidate samples. On the basis of tracking the video target, the invention can also extract the characteristic information of the patient from multiple angles, and can simultaneously track a plurality of targets.

Description

Tracking method for target person
Technical Field
The invention relates to the technical field of visual target tracking, in particular to a tracking method for a target person.
Background
Visual target tracking is one of the very challenging problems in the field of computer vision. The task of visual target tracking is to estimate the state of a target in a subsequent frame under the condition that the state (namely, the position, the size and other information) of the target in an initial frame is given in a video sequence, and video multi-target tracking refers to the fact that multiple types of targets in the same scene need to be tracked. The video multi-target tracking comprises two steps of target detection and target tracking: common target detection methods include an optical flow method, an interframe difference method, a background subtraction method, a target detection method based on deep learning and the like; the mature target tracking comparison method comprises a Kalman filter algorithm, a Meanshift algorithm and a Camshift algorithm.
The large environment of a hospital generally represents three aspects, one is large flow of people, the other is similar to the background, the other is noisy, although the visual target tracking technology is rapidly developed in recent years, due to the influences of factors such as multiple targets, alternative shielding among multiple targets, appearance deformation, rapid movement, illumination change, scale change, complex background and the like in the tracking process, the application of the visual target tracking technology is still difficult and serious, and the problems that the appearance of the moving target changes and the similarity of the target and the background cannot well track a patient in the video target tracking process, so that a tracking method for a target person is provided.
Disclosure of Invention
1. Technical problem to be solved
Aiming at the problems in the prior art, the invention aims to provide a tracking method for a target person, which can extract the characteristic information of a patient from multiple angles and track a plurality of targets simultaneously on the basis of tracking a video target.
2. Technical scheme
In order to solve the problems, the invention adopts the following technical scheme:
a tracking method for a target person, comprising:
s1, establishing a motion model: the motion model mainly models and estimates the motion track of the target in the video, and the speed and quality of the generated candidate sample directly determine the quality of the tracking system;
s2, feature extraction: after determining the target search range of the current frame according to the motion model, next, feature extraction needs to be carried out on a candidate area or a candidate sample;
s3, observing a model: according to different observation models, a tracking algorithm can be mainly divided into a generating type tracker and a discriminant type tracker, the generating type tracker only uses target information of a foreground to construct a tracking model, and an optimal sample is selected by measuring reconstruction errors or similarity of candidate samples;
s4, updating the model: the sparse representation tracker updates the sparse dictionary with the newly collected positive samples; updating a decision plane by using positive and negative samples collected in a subsequent frame based on a tracking algorithm of the SVM, and updating an initial filter by using a filter obtained in the subsequent frame through a related filter according to an exponential moving average strategy; continuously collecting new positive and negative samples by a tracker based on the classification network to finely adjust the classification network on line;
s5, updating the tracking data set: the trace data set contains two versions, OTB-2013 and OTB-2015. The OTB-2013 comprises 51 common test videos in the past tracking field, the data set and the evaluation standard provide a unified testing and evaluation environment for a tracking algorithm, the OTB-2015 data set is an extension of the OTB-2013 and comprises 100 challenging videos in total, and the data set also marks 10 video attributes of shielding, deformation, rapid movement, illumination change and blurring on the videos, so that the capability of the tracker for responding to different scenes can be analyzed conveniently;
s6, face locking: and (5) sampling the face of the patient for multiple times according to the S4 and the S5, and tracking the face of the patient.
As a preferred scheme of the present invention, in each frame of S1, a CAM Shift target tracking algorithm is used, the CAM Shift target tracking algorithm uses 16-level quantization of a standard RGB color space to represent the appearance of a target color, and in consideration of the importance of a target central region and the susceptibility of target peripheral points to noise, kernel function methods are used to assign different weights to pixel points at different positions, and a target joint probability density distribution map is established by combining color and gray image edge detection methods.
As a preferred scheme of the present invention, the discriminative feature representation in S2 feature extraction is one of the keys of target tracking, different feature representations are selected according to the condition of a target, a common feature employs a depth feature, the depth feature is a feature learned by a large number of training samples, a good effect can be obtained easily in a tracking task by using a tracking method of the depth feature, a good feature expression not only needs to depict an appearance expression that is rich, robust and has (rotation, deformation and illumination) invariance of a candidate target, the statistical property of a color histogram enables such an algorithm to handle target deformation more robustly, a depth correlation filter algorithm commonly employs multiple layers of CNN features, a feature extraction network and a correlation filter are jointly trained, so that the depth feature is more suitable for a correlation filter algorithm, the modeling manner of the correlation filter is widely applied to a tracking framework, a tracker based on a classification network (such as MDNet, via) mainly employs a VGG-M network for feature extraction and on-line full connectivity layer for sample classification, and the Shift-CAM target tracking algorithm employs a net-50 CAM-based on-Shift network for on-line training.
As a preferred embodiment of the present invention, the conventional generative tracking framework in S3 includes sparse representation and subspace learning, the discriminant tracker considers foreground information and background information at the same time to learn a discriminant model with distinction, the discriminant tracker includes a random forest classifier, an SVM tracker and a correlation filter, the generative model is based on a tracking algorithm of subspace learning, the core idea of the generative model algorithm is to map features from a high dimension to a low dimension, thereby constructing a series of subspaces to model the appearance of the target, further calculate reconstruction errors or similarities of candidate samples in the subspaces to pick out the most probable target, the discriminant model distinguishes positive and negative samples through a classifier model learned by the SVM based tracking algorithm, the tracking algorithm of the correlation filter processes a picture to be tracked by learning a filter with distinction, outputs a response map representing confidence degrees of different positions in a subsequent frame, the correlation filter obtains a coarse domain solution by using a regression problem of properties of cyclic samples and cyclic matrices, trains different depth features of different layers to be respectively through a coarse domain fusion (fine domain), and then performs a fine solution process by using a fine domain fusion method.
As a preferred scheme of the present invention, in S4, due to factors of target occlusion, deformation, and tracking drift, a model degradation may be caused by a contaminated positive sample collected in a tracking process, the SVM-based tracking algorithm enhances robustness of the algorithm by mining a difficult negative sample (hard negative mining), and suppresses a redundant negative sample by designing a loss function, in order to better adapt to a change in appearance of a target, the SVM-based tracking algorithm mines template information of a historical frame using an LSTM (Long Short Term Memory) structure to update a template of a current frame, trains an independent convolutional network, and predicts an optimal template feature in a next frame using the historical template, and the SVM-based tracking algorithm updates the template using gradient information, which may suppress background information in the template to a certain extent.
As a preferred scheme of the invention, in S2 feature extraction, the depth tracking network comprises a series of convolutional layers to extract robust feature expression of a candidate sample, and the sample is subjected to secondary classification through a subsequent full-link layer, the method uses a classification type network to perform target tracking, a target in a video can be a background object in other videos, a training frame with multiple data domains is introduced to perform shared feature extraction on a search area, and then ROI-Align is used to cut out the candidate sample features, so that the tracking speed is improved by more than 2.5 times under the condition that the precision is only slightly influenced.
As a preferred scheme of the present invention, the method for extracting the RGB color space characterization target color in S2 comprises: establishing a target color model (such as face color, arm color and skin color) during initialization, carrying out edge detection on an image in a subsequent video image by using a Sobel edge detection operator, and obtaining a joint probability density distribution diagram with different weights according to edge and color characteristics.
As a preferred solution of the present invention, the solution of occlusion for the tracking data set in S5 is: and judging whether the target is shielded by using a detection mechanism so as to determine whether to update the template, ensuring the robustness of the template on shielding, dividing the target into a plurality of blocks, and effectively tracking by using the blocks which are not shielded. For the situation that the target is completely shielded, an effective method cannot be completely solved at present;
the method for solving the deformation by the video tracking data set in the S5 comprises the following steps: updating the appearance model of the target to adapt to the change of the appearance;
the method for solving the background speckle by the video tracking data set in the S5 comprises the following steps: the method comprises the steps of predicting a motion approximate track by utilizing motion information of a target, preventing a tracker from tracking other similar targets, or utilizing a large number of sample frames around the target to update and train a classifier, and improving the distinguishing capability of the classifier on a background and the target;
the solving method of the video tracking data set for the scale transformation in the S5 comprises the following steps: when the motion model generates candidate samples, a large number of candidate frames with different scales are generated, or target tracking is performed on a plurality of targets with different scales to generate a plurality of prediction results, and the optimal candidate frame is selected as the final prediction target.
The method for solving the motion blur by the video tracking data set in the S5 comprises the following steps: the target area is blurred due to the motion of the target or the camera, and the tracking effect is poor. Tracking is performed by a common mean shift tracking method, and a tracking target can be completed by using information obtained from motion blur without deblurring.
In the S5, a solution method of the video tracking data set to illumination is as follows: RGB color information and texture information acquired from the RGB color space in the S2 are fused with each other by confidence coefficient to inhibit shadows, so that the robustness of moving target tracking under the condition of illumination transformation is improved;
the solution of the video tracking data set to the rotation in the S5 is as follows: affine transformation is introduced into the tracking module, and the affine transformation can rotate and transform the position of a coordinate system or a boundary frame of a target according to transformed freedom parameters, so that accurate target tracking is realized.
The solution method of the video tracking data set to the fast motion in the S5 comprises the following steps: and finally, drawing a motion track of each target by a region correspondence and color-based minimum Euclidean distance method.
The video tracking data set in the S5 is used for solving the out-of-view problem: by introducing a detector (TLD algorithm suggests that tracking and detection may be mutually facilitated) for the complement in case of a tracking failure, the tracking provides positive samples to the detector, which re-initializes the tracker in case of a tracking failure. Resulting in enhanced tracking robustness.
The solution of the video tracking data set to the low resolution in the S5 is as follows: and establishing a target model by adopting a non-negative matrix factorization method, extracting important contour information of the target by adopting non-negative matrix factorization iterative computation, representing the target in a dictionary matrix form, and further completing tracking.
3. Advantageous effects
Compared with the prior art, the invention has the advantages that:
the video multi-target tracking method based on multi-feature fusion effectively integrates multi-target identification, multi-target tracking, target feature extraction, video target structuring and target optimal matching, effectively improves detection efficiency through multi-target detection based on parallel, extracts target feature information, represents target color appearance through standard RGB color space 16-level quantization, gives different weights to pixel points at different positions by adopting a kernel function method in consideration of the fact that the importance of a target central area and the target peripheral points are easily influenced by noise, and establishes a target joint probability density distribution diagram by combining a color and gray level image edge detection method. Therefore, the robustness of the video multi-target tracking method is improved, the situations of multi-target motion overlapping, partial shielding and deformation are overcome by combining multi-target structuring, multi-target tracking and an optimal matching method based on characteristics, effective tracking of the video multi-target is realized, the video multi-target tracking method has the advantages of high speed and high efficiency, can be widely applied to actual combat, and can create certain economic benefit and use value.
Drawings
FIG. 1 is an overall flow chart of a tracking method for a target person according to the present invention;
FIG. 2 is a schematic diagram of the motion modeling used in the tracking method for the target person according to the present invention;
FIG. 3 is a schematic diagram of an observation model used in the tracking method of the target person according to the present invention;
FIG. 4 is a schematic diagram of a tracking data set in the tracking method for the target person according to the present invention;
FIG. 5 is a flow chart of RGB color space representation target color extraction in a tracking method for a target person according to the present invention;
fig. 6 is a content processing diagram of a tracking data set in the tracking method for a target person according to the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work are within the scope of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "top/bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "sleeved/connected," "connected," and the like are to be construed broadly, such as "connected," which may be a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example (b):
referring to fig. 1 to 6, a tracking method for a target person includes:
s1, establishing a motion model: the motion model mainly models and estimates the motion track of the target in the video, and the speed and quality of the generated candidate sample directly determine the quality of the tracking system;
s2, feature extraction: after determining the target search range of the current frame according to the motion model, next, feature extraction needs to be carried out on a candidate area or a candidate sample;
s3, observing the model: according to different observation models, a tracking algorithm can be mainly divided into a generating type tracker and a discriminant type tracker, the generating type tracker only uses target information of a foreground to construct a tracking model, and an optimal sample is selected by measuring reconstruction errors or similarity of candidate samples;
s4, updating the model: the sparse representation tracker updates the sparse dictionary with the newly collected positive samples; updating a decision plane by using positive and negative samples collected in a subsequent frame based on a tracking algorithm of the SVM, and updating an initial filter by using a filter obtained in the subsequent frame through a related filter according to an exponential moving average strategy; continuously collecting new positive and negative samples by a tracker based on the classification network to finely adjust the classification network on line;
s5, updating the tracking data set: the trace data set contains two versions, OTB-2013 and OTB-2015. The OTB-2013 comprises 51 common test videos in the past tracking field, the data set and the evaluation standard provide a unified testing and evaluation environment for a tracking algorithm, the OTB-2015 data set is an extension of the OTB-2013 and comprises 100 challenging videos in total, and the data set also marks 10 video attributes of shielding, deformation, rapid movement, illumination change and blurring on the videos, so that the capability of the tracker for responding to different scenes can be analyzed conveniently;
s6, face locking: and (5) sampling the face of the patient for multiple times according to the S4 and the S5, and tracking the face of the patient.
In the specific embodiment of the invention, the video multi-target tracking method based on multi-feature fusion effectively integrates multi-target identification, multi-target tracking, target feature extraction, video target structuring and target optimal matching, effectively improves detection efficiency through multi-target detection based on parallel, extracts target feature information, represents the color appearance of a target through 16-level quantization of a standard RGB color space, gives different weights to pixel points at different positions by adopting a kernel function method in consideration of the importance of a target central area and the susceptibility of target peripheral points to noise, and establishes a target joint probability density distribution diagram by combining color and gray level image edge detection methods. Therefore, the robustness of the video multi-target tracking method is improved, the situations of multi-target motion overlapping, partial shielding and deformation are overcome by combining multi-target structuring, multi-target tracking and an optimal matching method based on characteristics, effective tracking of the video multi-target is realized, the video multi-target tracking method has the advantages of high speed and high efficiency, can be widely applied to actual combat, and can create certain economic benefit and use value.
Specifically, in each frame of S1, a CAM Shift target tracking algorithm is adopted, the CAM Shift target tracking algorithm represents the appearance of a target color by 16-level quantization of a standard RGB color space, a kernel function method is adopted to give different weights to pixel points at different positions in consideration of the importance of a target central region and the susceptibility of target peripheral points to noise, and a target joint probability density distribution diagram is established by combining a color and gray level image edge detection method.
In an embodiment of the invention, by setting the CAM Shift target tracking algorithm, the target color appearance can be characterized by standard RGB color space 16 level quantization, thereby improving the accuracy of the patient model.
Specifically, the discriminative feature representation in the S2 feature extraction is one of the keys of target tracking, different feature representations are selected according to the condition of a target, a depth feature is adopted as a common feature, the depth feature is a feature learned through a large number of training samples, a good effect can be obtained in a tracking task by using a tracking method of the depth feature, good feature expression not only needs to depict rich, robust and invariant (rotation, deformation and illumination) appearance expressions of candidate targets, the algorithm can process target deformation more robustly due to the statistical characteristics of a color histogram, a depth correlation filter algorithm generally adopts multilayer CNN features, a feature extraction network and a correlation filter are jointly trained, the depth feature is more suitable for the correlation filter algorithm, the modeling mode of the correlation filter is widely applied to a tracking framework, a tracker (such as MDNet and vita) based on a classification network mainly adopts a VGG-M network to perform feature extraction and trains a fully-connected layer on-line to perform sample classification, and a Shift-50 network is adopted as a target tracking algorithm.
In the specific embodiment of the invention, through the setting of the depth characteristics, characteristics learned by a large number of training samples can be obtained, a good effect can be obtained in a tracking task by using a tracking method of the depth characteristics, and good characteristic expression not only needs to depict rich, robust and invariant appearance expression of candidate targets.
Specifically, a common generative tracking framework in S3 includes sparse representation and subspace learning, a discriminant tracker considers foreground information and background information at the same time to learn a discriminant model with distinction, the discriminant tracker includes a random forest classifier, an SVM tracker and a correlation filter, the generative model is based on a tracking algorithm of subspace learning, the core idea of the generative model algorithm is to map features from high dimension to low dimension, thereby constructing a series of subspaces to model the appearance of a target, further calculate a reconstruction error or similarity of candidate samples in the subspaces to select a most likely target, the discriminant model distinguishes positive and negative samples through the classifier model learned by the SVM based tracking algorithm, the tracking algorithm of the correlation filter processes a picture to be tracked by learning a filter with distinction, an output result is a response map, represents confidence degrees of different positions of the target in a subsequent frame, the correlation filter solves a ridge regression problem by using properties of a cyclic sample and a cyclic matrix, and performs efficient fusion of coarse to a fine filter, and then performs a frequency domain optimization using a closed solution.
In the specific embodiment of the invention, through the arrangement of the discriminant model based on the tracking algorithm of the SVM, positive and negative samples can be distinguished through a classifier model learned by the SVM, so that the distinguished positive and negative samples are better analyzed and compared.
Specifically, in S4, due to factors of target occlusion, deformation, and tracking drift, a contaminated positive sample collected in the tracking process may cause model degradation, the SVM-based tracking algorithm enhances robustness of the algorithm by mining a difficult negative sample (hard negative mining), and suppresses a redundant negative sample by designing a loss function, in order to better adapt to a change in the appearance of the target, the SVM-based tracking algorithm mines template information of a historical frame by using an LSTM (Long Short Term Memory) structure to update a template of a current frame, trains an independent convolutional network, predicts an optimal template feature in the next frame by using the historical template, and updates the template by using gradient information, thereby suppressing background information in the template to a certain extent.
In the specific embodiment of the invention, template information of a historical frame is mined by utilizing an LSTM structure through a tracking algorithm of an SVM to update a template of a current frame, an independent convolution network is trained, and an optimal template characteristic is predicted in the next frame by utilizing the historical template, so that the accuracy of the method is improved.
Specifically, in S2 feature extraction, the depth tracking network comprises a series of convolutional layers to extract robust feature expression of candidate samples, and the samples are subjected to secondary classification through subsequent full-connected layers, the method uses a classification type network to perform target tracking, targets in the video can become background objects in other videos, a training frame with multiple data domains is introduced, shared feature extraction is performed on a search region, and then ROI-Align is used to cut out candidate sample features, so that the tracking speed is improved by more than 2.5 times under the condition that the accuracy is only slightly influenced.
In the specific embodiment of the invention, through the setting of the deep tracking network, the samples can be classified, so that a plurality of targets can be tracked simultaneously.
Specifically, the method for extracting the target color represented by the RGB color space in S2 comprises: during initialization, a target color model (such as face color, arm color and skin color) is established, edge detection is carried out on images in subsequent video images by a Sobel edge detection operator, and joint probability density distribution maps with different weights are obtained through edge and color features.
In the specific embodiment of the invention, the accuracy of target identification can be improved by representing the setting of target color extraction through the RGB color space in S2, so that the target can be better distinguished.
Specifically, the solution of the tracking data set to the occlusion in S5 is as follows: and judging whether the target is shielded by using a detection mechanism so as to determine whether to update the template, ensuring the robustness of the template on shielding, dividing the target into a plurality of blocks, and effectively tracking by using the blocks which are not shielded. For the situation that the target is completely shielded, an effective method cannot be completely solved at present;
s5, solving the deformation of the video tracking data set: updating the appearance model of the target to adapt to the change of the appearance;
s5, solving the background speckle by the video tracking data set: the method comprises the steps of predicting a motion approximate track by utilizing motion information of a target, preventing a tracker from tracking other similar targets, or utilizing a large number of sample frames around the target to update and train a classifier, and improving the distinguishing capability of the classifier on a background and the target;
s5, solving the scale transformation of the video tracking data set: when the motion model generates candidate samples, a large number of candidate frames with different scales are generated, or target tracking is performed on a plurality of targets with different scales to generate a plurality of prediction results, and the optimal one of the prediction results is selected as the final prediction target.
In S5, a method for solving the motion blur by the video tracking data set comprises the following steps: the target area is blurred due to the motion of the target or the camera, and the tracking effect is poor. Tracking is performed by a common mean shift tracking method, and a tracking target can be completed by using information obtained from motion blur without deblurring.
S5, solving the illumination of the video tracking data set: the RGB color information and the texture information acquired from the RGB color space in the S2 are fused with each other by confidence coefficient to inhibit the shadow, so that the robustness of the moving target tracking under the condition of illumination transformation is improved;
and S5, solving the rotation of the video tracking data set: affine transformation is introduced into the tracking module, and the affine transformation can rotate and transform the position of a coordinate system or a boundary frame of a target according to transformed freedom parameters, so that accurate target tracking is realized.
In S5, the video tracking data set is used for solving the problem of quick motion: and finally, drawing a motion track of each target by a region correspondence and color-based minimum Euclidean distance method.
And S5, solving the out-of-view problem of the video tracking data set: by introducing a detector (TLD algorithm suggests that tracking and detection may be mutually facilitated) for the complement in case of a tracking failure, the tracking provides positive samples to the detector, which re-initializes the tracker in case of a tracking failure. Resulting in enhanced tracking robustness.
The solution of the video tracking data set to the low resolution in S5 is as follows: a target model is established by adopting a non-negative matrix factorization method, important contour information of the target is extracted through iterative calculation of the non-negative matrix factorization, the target is represented in a dictionary matrix mode, and then tracking is completed.
In the embodiment of the invention, the method can deal with various special conditions by introducing the video tracking data set to the processing of different conditions in S5, thereby increasing the practicability of the method.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the equivalent replacement or change according to the technical solution and the modified concept of the present invention should be covered by the scope of the present invention.

Claims (8)

1. A tracking method for a target person, comprising:
s1, establishing a motion model: the motion model mainly models and estimates the motion track of the target in the video, and the speed and the quality of the generated candidate sample directly determine the quality of the performance of the tracking system;
s2, feature extraction: determining a target search range of a current frame according to a motion model, and then performing feature extraction on a candidate area or a candidate sample;
s3, observing the model: according to different observation models, a tracking algorithm can be mainly divided into a generating type tracker and a discriminant type tracker, the generating type tracker only uses target information of a foreground to construct a tracking model, and an optimal sample is selected by measuring reconstruction errors or similarity of candidate samples;
s4, updating the model: the sparse representation tracker updates the sparse dictionary with the newly collected positive samples; updating a decision plane by using positive and negative samples collected in a subsequent frame based on a tracking algorithm of the SVM, and updating an initial filter by using a filter obtained in the subsequent frame through a related filter by using an exponential moving average strategy; continuously collecting new positive and negative samples by a tracker based on the classification network to finely adjust the classification network on line;
s5, updating a tracking data set: the trace data set contains two versions, OTB-2013 and OTB-2015. The OTB-2013 comprises 51 common test videos in the past tracking field, the data set and the evaluation standard provide a unified testing and evaluation environment for a tracking algorithm, the OTB-2015 data set is an extension of the OTB-2013 and comprises 100 challenging videos in total, and the data set also marks 10 video attributes of shielding, deformation, rapid movement, illumination change and blurring on the videos, so that the capability of the tracker for responding to different scenes can be analyzed conveniently;
s6, face locking: and (5) sampling the face of the patient for multiple times according to the S4 and the S5, and tracking the face of the patient.
2. The method as claimed in claim 1, wherein in each frame of S1, a CAM Shift target tracking algorithm is used, which uses 16-level quantization of standard RGB color space to characterize the color appearance of the target, and in consideration of the importance of the central region of the target and the susceptibility of peripheral points of the target to noise, kernel function methods are used to assign different weights to pixel points at different positions, and the color and gray level image edge detection methods are combined to establish a target joint probability density distribution map.
3. The method as claimed in claim 1, wherein the distinctive feature representation in the S2 feature extraction is one of the keys of target tracking, different feature representations are selected according to the condition of a target, a common feature employs a depth feature, the depth feature is a feature learned by a large number of training samples, a good effect can be obtained by using the tracking method of the depth feature in a tracking task with ease, a good feature expression not only needs to depict rich, robust and invariant (rotation, deformation and illumination) appearance expressions of candidate targets, the statistical properties of color histograms enable such algorithms to handle target deformation more robustly, the depth correlation filter algorithm commonly employs multiple layers of CNN features, the feature extraction network and the correlation filter are jointly trained, so that the depth feature is more suitable for the correlation filter algorithm, the modeling manner of the correlation filter is widely applied in a tracking framework, the network-based tracker (such as MDNet and vita) mainly employs a VGG-M network for feature extraction and an online sample classification layer for sample classification, and the tracking network employs a full connectivity network-Shift algorithm 50.
4. The method as claimed in claim 1, wherein the generic generated tracking framework in S3 includes sparse representation and subspace learning, the discriminant tracker considers foreground information and background information simultaneously to learn discriminant models with distinction, the discriminant tracker includes random forest classifier, SVM tracker and correlation filter, the generated models are based on tracking algorithm of subspace learning, the core idea of the generated model algorithm is to map features from high dimension to low dimension, so as to construct a series of subspaces to model the appearance of the target, further calculate reconstruction error or similarity of candidate samples under the subspaces to pick out the most probable target, the discriminant model distinguishes positive and negative samples through classifier model of SVM learning based on tracking algorithm of SVM, the tracking algorithm of the correlation filter processes the picture to be tracked by learning a filter with distinction, the output result is a response map to represent confidence of different positions of the target in subsequent frames, the correlation filter obtains a fine solution of different layers of features by using a coarse loop sample and a loop matrix of properties, and processes the fine solution of different layers of the correlation filter to obtain a fine solution of the respective related features (fine solution) by using a frequency domain filter, and then performs fine solution on the respective fine solution of the respective related features.
5. The method as claimed in claim 1, wherein the model degradation may be caused by contaminated positive samples collected during tracking due to target occlusion, deformation, and tracking drift in S4, the SVM-based tracking algorithm uses hard negative samples (hard negative mining) to enhance the robustness of the algorithm and design a loss function to suppress redundant negative samples, and in order to better adapt to the target appearance change, the SVM-based tracking algorithm uses LSTM (Long Short Term Memory) structure to mine template information of historical frames to update templates of current frames, trains an independent convolutional network and uses historical templates to predict an optimal template feature in the next frame, and updates templates by gradient information to suppress background information in the templates to a certain extent.
6. The method as claimed in claim 1, wherein the deep tracking network comprises a series of convolutional layers to extract robust feature expression of candidate samples in S2 feature extraction, and the samples are classified into two classes through a subsequent full link layer, the method uses a classification network for target tracking, the target in the video may become a background object in other videos, a training frame with multiple data fields is introduced, the shared feature extraction is performed on the search region, and then ROI-Align is used to crop out candidate sample features, so that the tracking speed is increased by more than 2.5 times under the condition that the accuracy is only slightly affected.
7. The method for tracking the target person as claimed in claim 1, wherein the RGB color space in S2 represents a method of target color extraction: establishing a target color model (such as face color, arm color and skin color) during initialization, carrying out edge detection on an image in a subsequent video image by using a Sobel edge detection operator, and obtaining a joint probability density distribution diagram with different weights according to edge and color characteristics.
8. The method for tracking the target person as claimed in claim 1, wherein the solution of the tracking data set to occlusion in S5 is: and judging whether the target is shielded by using a detection mechanism so as to determine whether to update the template, ensuring the robustness of the template on shielding, dividing the target into a plurality of blocks, and effectively tracking by using the blocks which are not shielded. For the situation that the target is completely shielded, an effective method cannot be completely solved at present;
the solution method of the video tracking data set to the deformation in the S5 comprises the following steps: updating the appearance model of the target to adapt to the change of the appearance;
the method for solving the background speckle by the video tracking data set in the S5 comprises the following steps: the general motion track is predicted by utilizing the motion information of the target, the tracker is prevented from tracking other similar targets, or a large number of sample frames around the target are utilized to update and train the classifier, and the distinguishing capability of the classifier on the background and the target is improved;
the solving method of the video tracking data set for the scale transformation in the S5 comprises the following steps: when the motion model generates candidate samples, a large number of candidate frames with different scales are generated, or target tracking is performed on a plurality of targets with different scales to generate a plurality of prediction results, and the optimal one of the prediction results is selected as the final prediction target.
The method for solving the motion blur by the video tracking data set in the S5 comprises the following steps: the target area is blurred due to the motion of the target or the camera, and the tracking effect is poor. Tracking is performed by a common mean shift tracking method, and a tracking target can be completed by using information obtained from motion blur without deblurring.
In the S5, a solution method of the video tracking data set to the illumination is as follows: the RGB color information and the texture information acquired from the RGB color space in the S2 are fused with each other by confidence coefficient to inhibit the shadow, so that the robustness of the moving target tracking under the condition of illumination transformation is improved;
the solution of the video tracking data set to the rotation in the S5 is as follows: affine transformation is introduced into the tracking module, and the affine transformation can rotate and transform the position of a coordinate system or a boundary frame of a target according to transformed freedom parameters, so that accurate target tracking is realized.
The solution method of the video tracking data set to the fast motion in the S5 comprises the following steps: and finally, drawing a motion track of each target by a region correspondence and color-based minimum Euclidean distance method.
The video tracking data set in the S5 is used for solving the out-of-view problem: by introducing a detector (TLD algorithm suggests that tracking and detection may be mutually facilitated) for the complement in case of a tracking failure, the tracking provides positive samples to the detector, which re-initializes the tracker in case of a tracking failure. So that tracking robustness is enhanced.
The solution of the video tracking data set to the low resolution in the S5 is as follows: a target model is established by adopting a non-negative matrix factorization method, important contour information of the target is extracted through iterative calculation of the non-negative matrix factorization, the target is represented in a dictionary matrix mode, and then tracking is completed.
CN202211008836.7A 2022-08-22 2022-08-22 Tracking method for target person Withdrawn CN115457079A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211008836.7A CN115457079A (en) 2022-08-22 2022-08-22 Tracking method for target person

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211008836.7A CN115457079A (en) 2022-08-22 2022-08-22 Tracking method for target person

Publications (1)

Publication Number Publication Date
CN115457079A true CN115457079A (en) 2022-12-09

Family

ID=84297887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211008836.7A Withdrawn CN115457079A (en) 2022-08-22 2022-08-22 Tracking method for target person

Country Status (1)

Country Link
CN (1) CN115457079A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118379331A (en) * 2024-06-24 2024-07-23 南京卓宇智能科技有限公司 Ground target stable tracking algorithm under complex background

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118379331A (en) * 2024-06-24 2024-07-23 南京卓宇智能科技有限公司 Ground target stable tracking algorithm under complex background

Similar Documents

Publication Publication Date Title
Xu et al. Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark
Basalamah et al. Scale driven convolutional neural network model for people counting and localization in crowd scenes
CN108470332B (en) Multi-target tracking method and device
Dang et al. A Feature Matching Method based on the Convolutional Neural Network.
CN108470354A (en) Video target tracking method, device and realization device
CN104200495A (en) Multi-target tracking method in video surveillance
CN107424171A (en) A kind of anti-shelter target tracking based on piecemeal
CN106815323B (en) Cross-domain visual retrieval method based on significance detection
CN111460976B (en) Data-driven real-time hand motion assessment method based on RGB video
CN110472081B (en) Shoe picture cross-domain retrieval method based on metric learning
CN112085765A (en) Video target tracking method combining particle filtering and metric learning
Li et al. Robust object tracking with discrete graph-based multiple experts
CN117541994A (en) Abnormal behavior detection model and detection method in dense multi-person scene
CN114861761A (en) Loop detection method based on twin network characteristics and geometric verification
Huang et al. Tracking-by-detection of 3d human shapes: from surfaces to volumes
Kanaujia et al. Part segmentation of visual hull for 3d human pose estimation
CN115457079A (en) Tracking method for target person
CN110910497A (en) Method and system for realizing augmented reality map
CN114627156A (en) Consumption-level unmanned aerial vehicle video moving target accurate tracking method
CN106023256A (en) State observation method for planar target particle filter tracking of augmented reality auxiliary maintenance system
CN114743257A (en) Method for detecting and identifying image target behaviors
Moridvaisi et al. An extended KCF tracking algorithm based on TLD structure in low frame rate videos
Yang Face feature tracking algorithm of aerobics athletes based on Kalman filter and mean shift
CN112183215B (en) Human eye positioning method and system combining multi-feature cascading SVM and human eye template
Liu et al. [Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20221209

WW01 Invention patent application withdrawn after publication