CN110276784B - Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics - Google Patents
Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics Download PDFInfo
- Publication number
- CN110276784B CN110276784B CN201910478278.2A CN201910478278A CN110276784B CN 110276784 B CN110276784 B CN 110276784B CN 201910478278 A CN201910478278 A CN 201910478278A CN 110276784 B CN110276784 B CN 110276784B
- Authority
- CN
- China
- Prior art keywords
- classifier
- peak
- target
- interference
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000007246 mechanism Effects 0.000 title claims abstract description 27
- 238000001914 filtration Methods 0.000 title claims abstract description 25
- 230000004044 response Effects 0.000 claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000001514 detection method Methods 0.000 claims abstract description 32
- 230000004927 fusion Effects 0.000 claims abstract description 23
- 230000003044 adaptive effect Effects 0.000 claims abstract description 18
- 238000010586 diagram Methods 0.000 claims abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 230000008569 process Effects 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 7
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 238000007499 fusion processing Methods 0.000 claims 2
- 230000000007 visual effect Effects 0.000 abstract description 4
- 210000004556 brain Anatomy 0.000 abstract description 3
- 230000008034 disappearance Effects 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 abstract description 2
- 230000001149 cognitive effect Effects 0.000 abstract description 2
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 14
- 238000004088 simulation Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000004043 responsiveness Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20048—Transform domain processing
- G06T2207/20056—Discrete and fast Fourier transform, [DFT, FFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a related filtering moving target tracking method based on a memory mechanism and convolution characteristics, and belongs to the technical field of computer vision. The method utilizes a pre-trained deep convolutional neural network to extract the convolutional characteristic of a target, is inspired by a human brain memory mechanism in human visual information processing cognitive behaviors, and integrates the memory mechanism into the detection, training and updating processes of a classifier of a relevant filtering method. The memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficient. The method has stronger robustness, and can still continuously and stably realize target tracking under the conditions of violent deformation, reappearance or shielding after temporary disappearance and the like of the target. Meanwhile, the method has higher target tracking speed, reduces the complexity and reduces the operation amount.
Description
Technical Field
The invention relates to a method for tracking a moving target in an image sequence, in particular to a method for tracking a related filtering moving target based on a memory mechanism and convolution characteristics, and belongs to the technical field of computer vision.
Background
The moving target tracking technology is an important research direction of computer vision science and is widely applied to the fields of safety monitoring, human-computer interfaces, medical diagnosis and the like. At present, the moving target tracking technology has the main problem that the tracking precision is reduced due to the fact that the influence of complex interference factors such as background illumination condition change, target shielding, shape change, size change, rapid movement and the like is difficult to overcome.
The discriminant tracking method is an important moving target tracking method, and specifically comprises the following steps: a multiple sample Learning (MIL) Tracking method, a Tracking-Learning-Detection (TLD) Tracking method, a core Structured output (Structured output Tracking with kernel) Tracking method, and the like. The principle of such a method is: firstly, training a classifier by taking a target as a positive sample and taking a background as a negative sample; then, the search area is detected by the classifier, and the point with the maximum responsiveness is regarded as the target center position for tracking. Typically, such methods train the classifier by sparse sampling, i.e., taking several equally sized windows around the target as samples. However, as the number of samples increases, the amount of calculation also increases, thereby decreasing the real-time performance of the tracking method.
The related filtering tracking method solves the problems of insufficient training samples and large calculated amount of the discriminant tracking method to a certain extent by constructing a cyclic matrix of the samples. For example, the KCF algorithm proposed by Henriques et al (Henriques J F, Rui C, Martins P, et al, "High-Speed Transmission with Kernelized Correlation Filters". IEEE Transactions on Pattern Analysis & Machine Analysis, 2014,37(3): 583-. And the relevant filtering process is realized through ridge regression operation based on the kernel. The algorithm has high real-time performance and realizes accurate tracking of the moving target under the nonlinear condition.
In recent years, research efforts in the field of deep learning have begun to be combined with correlation filtering tracking methods. For example, the HCF algorithm (Ma C, Huang J B, Yang X, et al. structural Convolutional Features for Visual Tracking [ C ]// IEEE International Conference on Computer vision. IEEE Computer Society,2015:3074-3082.) replaces HOG Features with hierarchical convolution Features within the framework of the KCF algorithm. According to the characteristics that the high-level features contain more semantic information and the low-level features contain more local information such as textures, outlines and the like, the approximate position of the target is determined by the highest-level features, and then the target is gradually and accurately positioned downwards, so that the method has higher robustness compared with the traditional manually extracted features.
Although the related filtering algorithm using convolution features has the advantages, the related filtering algorithm also has certain limitations: firstly, the classifier extracts convolution characteristics twice in detection and training, and the calculated amount is very large; secondly, the target template and the classifier are updated at a fixed rate every frame, so that the ability of adapting to the drastic change of the target is poor. Therefore, when the target has the conditions of shape mutation, serious shielding, reappearance after temporary disappearance and the like, the tracking precision of the target is obviously reduced, and even the target is lost; and it is difficult to meet the real-time requirements.
Disclosure of Invention
The invention aims to solve the problem that a target can be accurately and quickly tracked under the interference conditions of sudden change of posture and shape, reappearance after temporary disappearance, shielding and the like of the target, and provides a related filtering moving target tracking method based on a memory mechanism and convolution characteristics.
The method of the invention utilizes the deep convolutional neural network trained in advance to extract the convolutional characteristic of the target. Inspired by the human brain memory mechanism in the human visual information processing cognitive behavior, the memory mechanism is integrated into the detection, training and updating processes of the classifier of the relevant filtering method. The memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficient. The process of fusing the detection, training and updating of the memory mechanism and the classifier is described as follows:
(1) classifier detection based on response graph decision: after the convolution characteristics of the candidate region are extracted, all classifiers in the memory space are subjected to convolution operation with the candidate region to obtain respective response graphs, and the response graph with the maximum peak value is selected to position the target.
(2) Training a classifier based on adaptive peak detection: after the target is positioned, the size relation and the position relation of the main peak and the secondary high interference peak in the response diagram are synthesized, and the change condition of the target is analyzed. And if the interference degree is larger than the threshold value, extracting the convolution characteristics of the target again and training a new classifier. And if the interference degree is not greater than the threshold value, not training and updating the classifier.
(3) Updating the classifier based on the self-adaptive fusion coefficient: after a new classifier is trained, the fusion coefficient is calculated in a self-adaptive manner according to the result of peak detection. The more severe the interference, the larger the fusion coefficient.
Through the mode, the organic integration of the memory mechanism and the tracking method is realized.
The method of the invention is realized by the following specific steps:
a correlation filtering moving target tracking method based on a memory mechanism and convolution characteristics comprises the following steps:
step 1: the memory space is initialized.
The capacity of the memory space is set as m, and the memory space is filled first during the 1 st to m-th frames, and the memory mechanism is not executed. After the training of the classifier is finished in the ith frame, the parameters of the classifier are stored in a memory space as the ith classifier w [ i ], i ∈ { 1. Except for initializing the memory space, the method of the present invention has the same steps as the general correlation filtering tracking method. When the memory space is filled, the memory mechanism is started to be executed in the subsequent frame.
Step 2: classifier detection based on response graph decision is performed.
Step 2.1: and extracting the convolution characteristics of the current frame candidate area.
Reading the t-th frame image, t>And m, selecting a candidate area according to the target center position determined by the previous frame. And extracting the convolution characteristics of the tracking window by means of a pre-trained convolution neural network. After the subsequent region image is input into the convolutional neural network, the output of L layers in 19 convolutional layers is selected as the convolutional characteristicxt. the characteristic of the candidate region at the l layer at the time t is represented as xt[l],l∈L。
Extracting convolution features xtThen, with xtConstructing a circulant matrix for the generated matrix to obtain a test sample C (x)t)。
Step 2.2: all classifiers in the memory space are detected.
Let wt-1[i,l]Parameters representing the ith classifier learned before the tth frame in memory space correspond to the ith layer features, i ∈ { 1., m }, L ∈ L. With the test sample C (x)t) And (4) convolving with the classifier to obtain a response map, and regarding the position of the maximum response value on the response map as the target position.
As can be seen from the properties of the circulant matrix, the convolution of an arbitrary matrix and the circulant matrix in the time domain can be expressed as a dot product of the convolution and the generation matrix of the circulant matrix in the frequency domain. The response f of each layer characteristict[i,l]Adding according to fixed weight to obtain the response image f of the ith classifier in the memory space at the t framet[i]:
Wherein,denotes an Inverse Fast Fourier Transform (IFFT) operation, which is a dot product operator, capital letters denote Fourier transform forms of variables, and γ is a fusion weight. Xt[l]Representing a fourier transform version of the ith layer features at frame t.
Performing convolution operation on all classifiers in the memory space and the cyclic sample to obtain m response graphs, estimating a target position by taking the response graph with the maximum response peak value, and performing subsequent training and updating on the classifier corresponding to the response graph:
in the formula, pi is the index of the classifier corresponding to the maximum peak response graph in the memory space.
And step 3: classifier training based on adaptive peak detection is performed.
Step 3.1: adaptive peak detection.
And simultaneously, calculating and comparing the position and peak value size relationship of the main peak and the interference peak on the response diagram, and selecting a secondary peak except the main peak as the interference peak. When the interference peak is far away from the main peak, even if the interference peak is high, the target is considered not to be shielded; when the interference peak appears at a position closer to the main peak, it is determined that the target is occluded even if the interference peak is not high. And judging the target state by using the peak interference degree, wherein the formula is as follows:
wherein, the response diagram uses the center of the main peak as the origin to define the coordinate system again, H is the peak value of the main peak on the response diagram, H is the peak value of the interference peak, M is the distance from the main peak to the edge of the response diagram in the direction of the interference peak,is the position vector of the interference peak relative to the main peak,is a constructed paraboloid. If the interference peak is higher than the curved surface, the target is considered to be changed drastically. The rho value is the ratio of the distance of the interference peak exceeding the curved surface to the height of the whole interference peak, if the peak value interference degree rho is 0, all the following steps are skipped, the training and updating of the classifier are not carried out, and the next frame is directly entered; when peak interference degree rho>0, executing the following steps:
step 3.2: and extracting the convolution characteristics of the current frame target area.
According to the positioning result of the current frame in the step 2, the target center is taken as the center, the target area with the same size as the subsequent area is obtained by expansion, and the subsequent area is subjected to the positioningInputting the target area into a convolutional neural network, and extracting the convolution characteristic x of the target areat'。
Step 3.3: and (5) training a classifier.
Peak interference degree rho>0, indicating that the degree of matching between the classifier corresponding to the peak maximum response graph selected in step 3.1 and the target is poor, and a new classifier w needs to be trainedt' to accommodate changes in goals.
The principle of training the classifier is the same as that of the general correlation filtering method, and the classifier parameter w corresponding to the l-th layer feature is trained by minimizing the following formulat'[l]:
Wherein, x't[l]Features extracted at new positions during training are convolution operators, and lambda is a l2 regularization parameter; y is the trained target label function, which is a two-dimensional gaussian function with the same size as the classifier, and the peak is located at the center.
The closed-form solution to this minimization problem is:
And 4, step 4: and updating the classifier based on the adaptive fusion coefficient.
New classifier parameters wtAfter training, the classifier in the memory space is updated. Classifier wt-1[π]And wt' carry out weighted fusion, and the parameters of the rest classifiers are unchanged, and the formula is as follows:
wherein λ is a fusion coefficient of the classifier at the current frame, and is obtained by using a Sigmoid function in a self-adaptive manner:
wherein λ monotonically increases with respect to ρ such that the more drastic the change in the target, the faster the rate of classifier update; e is a natural log symbol.
Advantageous effects
Compared with the existing moving target tracking method, the method of the invention has the following advantages:
(1) and the robustness is strong. The method has stronger robustness, and the algorithm can memorize the state of the target during tracking by integrating the human brain memory mechanism into the related filtering algorithm. In one aspect, response map decisions are used to select the most appropriate classifier from the memory space for detection. On the other hand, the adaptive peak detection is utilized to train the classifier, the convolution characteristic of the target is extracted again only when the target is changed violently, and the classifier is updated according to the self-adaptive calculation fusion coefficient of the peak detection result, so that the target tracking can be continuously and stably realized under the conditions that the target is deformed violently, reappears or is shielded after disappearing briefly, and the like.
(2) The tracking speed is high. The method has higher target tracking speed. In one aspect, training samples for a classifier are constructed by cyclic shifting in the framework of correlation filtering. Meanwhile, the problem is transformed to a frequency domain for solving based on the characteristics of the cyclic matrix, and the matrix inversion process is avoided, so that the complexity of the algorithm is greatly reduced. On the other hand, classifier parameters of the target under different states are stored in a memory space. When the similar state occurs again, the classifier is directly selected and called according to the response value, the CNN feature of the target area does not need to be extracted again for retraining, and therefore the operation amount is reduced by half.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram illustrating the classifier detection step based on response map decision in the method of the present invention;
FIG. 3 is a schematic diagram illustrating the classifier training steps based on adaptive peak detection in the method of the present invention;
FIG. 4 is a schematic diagram illustrating the classifier updating step based on adaptive fusion coefficients in the method of the present invention;
FIG. 5 is a flow chart showing the method of the present invention;
FIG. 6 is a comparison of the tracking results of the method of the present invention and the conventional HCF method;
FIG. 7 is a graph of tracking accuracy for the method of the present invention and a conventional HCF method;
FIG. 8 is a comparison of the tracking index of the method of the present invention and the conventional HCF method.
Detailed Description
The method of the present invention will be described in detail with reference to the accompanying drawings and examples.
Examples
A method for tracking a related filtering moving target based on a memory mechanism and convolution characteristics is shown in FIG. 2, and comprises the following steps:
step 1: the memory space is initialized.
Let the capacity m of the memory space be 4. In frames 1 to 4, the method of the present invention is identical to the general correlation filtering tracking method except that the memory space is initialized. After the training of the classifier is finished in each frame, the parameters of the classifier are stored in a memory space to be used as the ith classifier in the memory space. At the end of frame 4, the memory space is filled and the memory mechanism begins to be executed in the subsequent frame.
Step 2: classifier detection based on response graph decision.
Step 2.1: and extracting the convolution characteristics of the current frame candidate area.
Reading the image of the t-th frame, and selecting a candidate area according to the target center position determined by the previous frame. The method of the invention extracts the tracking window by using the trained VGG-19 convolutional neural networkAnd (4) convolution characteristics. After the subsequent region image is input into the convolution network, the outputs of Conv3-4, Conv4-4 and Conv5-4 in 19 convolution layers are selected as convolution characteristics, namely L ═ Conv3-4, Conv4-4 and Conv 5-4. the characteristic of the candidate region at the l-th layer at the time point t is represented as xt[l],l∈L。
Extracting convolution features xtThen, with xtConstructing a circulant matrix for the generated matrix to obtain a test sample C (x)t)。
Step 2.2: detection of all classifiers in memory space.
Let wt-1[i,l]Parameters representing the ith classifier learned before the tth frame in memory space corresponding to the ith layer features, i ∈ {1,2,3,4}, L ∈ L. With the test sample C (x)t) Convolution with the classifier can obtain a response map, and the position of the maximum response value on the response map is regarded as the target position.
As can be seen from the properties of the circulant matrix, the convolution of an arbitrary matrix and the circulant matrix in the time domain can be expressed as a dot product of the convolution and the generator matrix of the circulant matrix in the frequency domain. The response f of each layer characteristict[i,l]Adding according to fixed weight to obtain the response image f of the ith classifier in the memory space at the t framet[i]:
Wherein,denotes an Inverse Fast Fourier Transform (IFFT) operation, which is a dot product operator, an upper case represents a fourier transform form of a variable, and γ is a fusion weight, which is set to be {0.25,0.5,1 }.
And (3) performing convolution operation on all classifiers in the memory space and the cyclic sample to obtain m response graphs, and estimating the target position by taking the response graph with the maximum response peak value. And the classifier corresponding to the response graph is subjected to subsequent training and updating.
And pi in the formula is the index of the classifier corresponding to the maximum peak response graph in the memory space.
And step 3: classifier training based on adaptive peak detection.
Step 3.1: adaptive peak detection
The core idea of adaptive peak detection is as follows: and simultaneously calculating and comparing the position and peak value size relationship of the main peak and the interference peak on the response diagram. And selecting secondary peaks except the main peak as interference peaks. When the interference peak is far away from the main peak, even if the interference peak is high, the target is considered not to be shielded; when the interference peak appears at a position closer to the main peak, it is determined that the target is occluded even if the interference peak is not high. And judging the target state by utilizing the peak interference degree, wherein the calculation formula is as follows:
in the formula, the response diagram uses the center of the main peak as an origin to define a coordinate system again, H is the peak value of the main peak on the response diagram, H is the peak value of the interference peak, M is the distance from the main peak to the edge of the response diagram in the direction of the interference peak,is the position vector of the interference peak relative to the main peak,is a constructed paraboloid. If the interference peak is higher than the curved surface, the target is considered to be changed drastically. The rho value is the ratio of the distance of the interference peak exceeding the curved surface to the height of the whole interference peak. If the peak interference ρ is 0, the following steps are skipped, and the training and updating of the classifier are not performed, and the next frame is entered directly.
When the peak interference degree ρ >0, the following steps are performed.
Step 3.2: and extracting the convolution characteristics of the current frame target area.
And (3) according to the positioning result of the current frame in the step (2), expanding to obtain a target area with the same size as the subsequent area by taking the target center as the center. Inputting the target area into a VGG-19 network, and extracting convolution characteristics x of the target areat'。
Step 3.3: and (5) training a classifier.
Peak interference degree rho>0, which means that the classifier corresponding to the peak maximum response graph selected in step 3.1 has a poor matching degree with the target, and a new classifier w needs to be trainedt' to accommodate changes in goals.
The principle of training the classifier is the same as that of the general correlation filtering method, and the classifier parameter w corresponding to the l-th layer feature is trained by minimizing the following formulat'[l]:
In the formula, x is a convolution operator, λ is a regularization parameter l2, y is a trained target label function, and is a two-dimensional gaussian function with the same size as the classifier, and the peak is located at the center.
The closed-form solution to this minimization problem is:
and 4, step 4: updating the classifier based on the adaptive fusion coefficient.
New classifier parameters wtAfter training, the classifier in the memory space is updated. Classifier wt-1[π]And wt' carrying out weighted fusion, keeping the parameters of the rest classifiers unchanged, and describing the parameters as follows by a formula:
wherein λ is a fusion coefficient of the classifier at the current frame, and is obtained by using a Sigmoid function in a self-adaptive manner:
λ monotonically increases with respect to ρ such that the more drastic the change in the target, the faster the rate of classifier update.
The simulation effect of the invention is illustrated by the following simulation experiment:
1. simulation conditions are as follows:
the invention uses MATLAB 2017b platform on the PC of Intel (R) core (TM) i7-7700HQ CPU 2.80GHz, RAM 8.00G, GTX1050GPU to complete simulation experiment to the video sequence in Visual Tracker Benchmark video test set.
2. And (3) simulation results:
fig. 3 is a graph of the tracking result of a video sequence with obvious occlusion on the target, which is the 330 th, 371 th, 390 th and 410 th frames, respectively, and the rectangular boxes in the graph represent the tracking result of the conventional method and the method of the present invention. As can be seen from FIG. 3, the method can accurately track the target in the process that the moving target reappears after being obviously shielded.
FIG. 4 is a graph comparing the tracking accuracy curves of the method of the present invention and a conventional HCF algorithm. The abscissa of the tracking precision curve refers to the Euclidean distance between the target center of the simulation tracking result and the real center marked in the grountruth, and the ordinate refers to the proportion of the number of frames with the Euclidean distance smaller than a certain threshold value in the length of the whole test video sequence. Fig. 5 is a graph of tracking accuracy versus tracking speed (FPS: frames per second) at a distance threshold of 20 pixels. Through evaluation statistics, for the Lemming sequence, the probability that the distance between the tracking result of the conventional HCF algorithm and the tracking result of the method of the invention and the actual position of the target is within 20 pixels is 0.6820 and 0.8920 respectively, and the tracking precision is improved by 30.8%. When CNN operation is completed on a GPU, the speeds of the conventional HCF algorithm and the algorithm provided by the invention are 4.4751fps and 5.1678fps respectively, and the speeds are improved by 15.5%; when the CNN operation is completed on the CPU, the speeds of the two algorithms are 1.1653fps and 2.1363fps respectively, and the speed is improved by 83.3%.
Claims (3)
1. A correlation filtering moving target tracking method based on a memory mechanism and convolution characteristics is characterized by comprising the following steps:
firstly, initializing a memory space, and the method comprises the following steps:
setting the capacity of a memory space as m, filling the memory space when the frames 1 to m are processed, not executing a memory mechanism for the moment, storing parameters of a classifier into the memory space after the training of the classifier is completed in the ith frame, using the parameters as an ith classifier w [ i ], wherein i belongs to { 1., m } in the memory space, and starting executing the memory mechanism in a subsequent frame when the memory space is filled;
secondly, extracting the convolution characteristics of the target by utilizing a pre-trained deep convolutional neural network, and integrating a memory mechanism into the detection, training and updating fusion process of a classifier of a related filtering method, wherein the memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficients; specifically, a memory mechanism is integrated into a detection, training and updating fusion process of a classifier of a relevant filtering method, and the method comprises the following steps:
the classifier detection based on the response graph decision comprises the following steps: after extracting the convolution characteristics of the candidate region, carrying out convolution operation on all classifiers in the memory space with the candidate region to obtain respective response graphs, and selecting the response graph with the maximum peak value to position the target;
the classifier training based on the self-adaptive peak detection comprises the following steps:
step A: self-adaptive peak detection;
meanwhile, the position and peak value size relation of the main peak and the interference peak on the response graph is calculated and compared, and the secondary peak except the main peak is selected as the interference peak; when the interference peak is far away from the main peak, even if the interference peak is high, the target is considered not to be shielded, and when the interference peak is close to the main peak, the target is judged to be shielded even if the interference peak is not high;
and judging the target state by using the peak interference degree, wherein the formula is as follows:
wherein, the response diagram uses the center of the main peak as the origin to define the coordinate system again, H is the peak value of the main peak on the response diagram, H is the peak value of the interference peak, M is the distance from the main peak to the edge of the response diagram in the direction of the interference peak,is the position vector of the interference peak relative to the main peak,for a constructed paraboloid, if the interference peak is higher than the curved surface, the target is considered to be changed violently; the rho value is the ratio of the distance of the interference peak exceeding the curved surface to the height of the whole interference peak, if the peak value interference degree rho is 0, the subsequent steps are skipped, the training and updating of the classifier are not carried out, and the next frame is directly entered; when peak interference degree rho>0, executing the following steps:
and B: extracting convolution characteristics of a current frame target area;
according to the positioning result of the current frame in the classifier detection process, a target center is used as a center, a target area with the same size as a subsequent area is obtained through expansion, the target area is input into a convolutional neural network, and the convolutional characteristic x of the target area is extractedt';
And C: training a classifier;
peak interference degree rho>0, indicating that the degree of matching between the classifier corresponding to the peak maximum response graph selected in the step A and the target is poor, and a new classifier w needs to be trainedt' to accommodate changes in goals;
training classifier parameters w corresponding to the l-th layer features by minimizing the following formulat'[l]:
Wherein, x't[l]Features extracted at new positions during training are convolution operators, and lambda is a l2 regularization parameter; y is a trained target label function and is a two-dimensional Gaussian function with the same size as the classifier, and the peak value is positioned at the center;
the closed-form solution to this minimization problem is:
updating a classifier based on a self-adaptive fusion coefficient, wherein the method comprises the following steps: after a new classifier is trained, a fusion coefficient is calculated in a self-adaptive mode according to a peak detection result, and the more severe the interference is, the larger the fusion coefficient is.
2. The method for tracking the moving object based on the correlation filtering of the memory mechanism and the convolution characteristic as claimed in claim 1, wherein the classifier detection based on the response graph decision is as follows:
step 2.1: extracting convolution characteristics of current frame candidate region
Setting the memory space capacity as m; reading the t-th frame image, t>m, selecting a candidate area according to the target center position determined by the previous frame; extracting the convolution characteristic of the tracking window by means of a pre-trained convolution neural network; after the subsequent region image is input into the convolutional neural network, the output of L layers in 19 convolutional layers is selected as the convolutional characteristic xtAnd the characteristic of the candidate region at the ith layer at the moment t is represented as xt[l],l∈L;
Extracting convolution features xtThen, with xtConstructing a circulant matrix for the generated matrix to obtain a test sample C (x)t);
Step 2.2: detecting all classifiers in a memory space
Let wt-1[i,l]Parameters representing the ith classifier learned before the tth frame in memory space corresponding to the ith layer features, i ∈ { 1., m }, L ∈ L, are used to detect the sample C (x)t) Convolving with a classifier to obtain a response graph, and regarding the position of the maximum response value on the response graph as a target position;
the response f of each layer characteristict[i,l]Adding according to fixed weight to obtain the response image f of the ith classifier in the memory space at the t framet[i]:
Wherein,denotes an Inverse Fast Fourier Transform (IFFT) operation, which is a dot product operator, capital letters denote Fourier transform forms of variables, and γ is a fusion weight; xt[l]A Fourier transform form representing the characteristics of the l < th > layer at the t < th > frame;
performing convolution operation on all classifiers in the memory space and the cyclic sample to obtain m response graphs, estimating a target position by taking the response graph with the maximum response peak value, and performing subsequent training and updating on the classifier corresponding to the response graph:
in the formula, pi is the index of the classifier corresponding to the maximum peak response graph in the memory space.
3. The method for tracking the moving object based on the correlation filtering of the memory mechanism and the convolution characteristic as claimed in claim 1, wherein the classifier updating method based on the adaptive fusion coefficient is as follows:
new classifier parametersNumber wtAfter training, updating the classifier in the memory space; classifier wt-1[π]And wt' carry out weighted fusion, and the parameters of the rest classifiers are unchanged, and the formula is as follows:
wherein λ is a fusion coefficient of the classifier at the current frame, and is obtained by using a Sigmoid function in a self-adaptive manner:
wherein λ monotonically increases with respect to ρ such that the more drastic the change in the target, the faster the rate of classifier update; e is a natural log symbol.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910478278.2A CN110276784B (en) | 2019-06-03 | 2019-06-03 | Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910478278.2A CN110276784B (en) | 2019-06-03 | 2019-06-03 | Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110276784A CN110276784A (en) | 2019-09-24 |
CN110276784B true CN110276784B (en) | 2021-04-06 |
Family
ID=67961901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910478278.2A Active CN110276784B (en) | 2019-06-03 | 2019-06-03 | Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276784B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021241487A1 (en) * | 2020-05-25 | 2021-12-02 | 国立大学法人東北大学 | Timing prediction method, timing prediction device, timing prediction system, program, and construction machinery system |
CN112183493A (en) * | 2020-11-05 | 2021-01-05 | 北京澎思科技有限公司 | Target tracking method, device and computer readable storage medium |
CN113298846B (en) * | 2020-11-18 | 2024-02-09 | 西北工业大学 | Interference intelligent detection method based on time-frequency semantic perception |
CN113538512B (en) * | 2021-07-02 | 2024-09-06 | 北京理工大学 | Photoelectric information processing method based on multilayer rotation memory model |
CN115115992B (en) * | 2022-07-26 | 2022-11-15 | 中国科学院长春光学精密机械与物理研究所 | Multi-platform photoelectric auto-disturbance rejection tracking system and method based on brain map control right decision |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106530340A (en) * | 2016-10-24 | 2017-03-22 | 深圳市商汤科技有限公司 | Appointed object tracking method |
CN107016689A (en) * | 2017-02-04 | 2017-08-04 | 中国人民解放军理工大学 | A kind of correlation filtering of dimension self-adaption liquidates method for tracking target |
CN107146238A (en) * | 2017-04-24 | 2017-09-08 | 西安电子科技大学 | The preferred motion target tracking method of feature based block |
CN108549839A (en) * | 2018-03-13 | 2018-09-18 | 华侨大学 | The multiple dimensioned correlation filtering visual tracking method of self-adaptive features fusion |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100211830A1 (en) * | 2009-02-13 | 2010-08-19 | Seagate Technology Llc | Multi-input multi-output read-channel architecture for recording systems |
CN104574445B (en) * | 2015-01-23 | 2015-10-14 | 北京航空航天大学 | A kind of method for tracking target |
CN107767405B (en) * | 2017-09-29 | 2020-01-03 | 华中科技大学 | Nuclear correlation filtering target tracking method fusing convolutional neural network |
CN107818575A (en) * | 2017-10-27 | 2018-03-20 | 深圳市唯特视科技有限公司 | A kind of visual object tracking based on layering convolution |
-
2019
- 2019-06-03 CN CN201910478278.2A patent/CN110276784B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106530340A (en) * | 2016-10-24 | 2017-03-22 | 深圳市商汤科技有限公司 | Appointed object tracking method |
CN107016689A (en) * | 2017-02-04 | 2017-08-04 | 中国人民解放军理工大学 | A kind of correlation filtering of dimension self-adaption liquidates method for tracking target |
CN107146238A (en) * | 2017-04-24 | 2017-09-08 | 西安电子科技大学 | The preferred motion target tracking method of feature based block |
CN108549839A (en) * | 2018-03-13 | 2018-09-18 | 华侨大学 | The multiple dimensioned correlation filtering visual tracking method of self-adaptive features fusion |
Non-Patent Citations (2)
Title |
---|
When Correlation Filters Meet Convolutional Neural Networks for Visual Tracking;Chao Ma,and etc;《IEEE Signal Processing Letters ( Volume: 23, Issue: 10, Oct. 2016)》;20161031;第23卷(第10期);第1454-1458页 * |
基于卷积神经网络的响应自适应跟踪;李勇等;《液晶与显示》;20180731;第33卷(第7期);第596-605页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110276784A (en) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276784B (en) | Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics | |
CN111354017B (en) | Target tracking method based on twin neural network and parallel attention module | |
CN110120064B (en) | Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning | |
CN109859241B (en) | Adaptive feature selection and time consistency robust correlation filtering visual tracking method | |
CN107016689A (en) | A kind of correlation filtering of dimension self-adaption liquidates method for tracking target | |
CN107154024A (en) | Dimension self-adaption method for tracking target based on depth characteristic core correlation filter | |
CN109325440B (en) | Human body action recognition method and system | |
CN111915644B (en) | Real-time target tracking method of twin guide anchor frame RPN network | |
CN107424177A (en) | Positioning amendment long-range track algorithm based on serial correlation wave filter | |
CN111582349B (en) | Improved target tracking algorithm based on YOLOv3 and kernel correlation filtering | |
CN110175649A (en) | It is a kind of about the quick multiscale estimatiL method for tracking target detected again | |
CN111612817A (en) | Target tracking method based on depth feature adaptive fusion and context information | |
CN110555870A (en) | DCF tracking confidence evaluation and classifier updating method based on neural network | |
CN116385945B (en) | Video interaction action detection method and system based on random frame complement and attention | |
CN110827327B (en) | Fusion-based long-term target tracking method | |
CN113920159B (en) | Infrared air small and medium target tracking method based on full convolution twin network | |
CN109272036B (en) | Random fern target tracking method based on depth residual error network | |
CN108257148B (en) | Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking | |
CN110751671B (en) | Target tracking method based on kernel correlation filtering and motion estimation | |
CN115482513A (en) | Apparatus and method for adapting a pre-trained machine learning system to target data | |
CN113033356B (en) | Scale-adaptive long-term correlation target tracking method | |
CN111145221A (en) | Target tracking algorithm based on multi-layer depth feature extraction | |
Masilamani et al. | Art classification with pytorch using transfer learning | |
CN116664623A (en) | Video target long-term tracking method based on twin network joint tracking and detection | |
CN116597275A (en) | High-speed moving target recognition method based on data enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |