CN113077490A

CN113077490A - Multilayer depth feature target tracking method based on reliability

Info

Publication number: CN113077490A
Application number: CN202110330225.3A
Authority: CN
Inventors: 尹明锋; 周文娟; 游丽萍; 花旭; 陈昌凯; 金圣昕; 周林苇; 贝绍轶
Original assignee: Jiangsu China Israel Industrial Technology Research Institute; Jiangsu University of Technology
Current assignee: Jiangsu China Israel Industrial Technology Research Institute; Jiangsu University of Technology
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2021-07-06

Abstract

The invention discloses a reliability-based multilayer depth feature target tracking method, which measures the characterization capability of different layer features by calculating the reliability of a channel by utilizing the difference of the identification capability of the different layer features in a tracking scene, and further fuses the positioning information of the different layer features to obtain more accurate target position information; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness. The invention improves the representation capability of the model and realizes more accurate target positioning; the generalization capability of the model is improved; the capability of the tracking model for coping with target scale change in a complex scene is improved; robustness of a tracking model for responding to target representation change and external interference in a tracking scene is improved; compared with the existing common comparison algorithm, the method has the advantages of higher tracking precision and success rate.

Description

Multilayer depth feature target tracking method based on reliability

Technical Field

The invention relates to a multilayer depth characteristic target tracking method based on reliability, and belongs to the technical field of network communication.

Background

Target tracking is a research hotspot in the field of computers, and has wide application in the fields of video monitoring, human-computer interaction, intelligent transportation and the like. The object of the visual tracking task is to detect a continuously moving object in an image sequence, obtain the motion information of the object, further extract the motion track of the object, and analyze the motion of the object, thereby realizing the understanding of the motion behavior of the object. Due to the diversity and complexity of tracking scenes, the existing target tracking algorithm is still inaccurate in distinguishing and positioning the target, and the method has very important research significance on further improving the performance of the existing target tracking algorithm.

The shallow layer features are mainly concentrated on low-layer information, such as shapes, textures, colors and the like, and have great influence on positioning accuracy; the deep features have rich semantic information, have stronger robustness such as deformation, motion blur and the like when dealing with complex tracking scenes, but lose more spatial details due to low resolution. Therefore, how to comprehensively utilize shallow and deep information in a complex tracking scene is a problem which needs to be solved urgently.

The prior art has the defects that: the traditional target tracking method based on the correlation filtering uses a single feature to represent a target, so that the expression force of the feature on the current tracking target is insufficient, and particularly when similar targets interfere in the background, a tracker cannot distinguish the targets, and even the tracking fails.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a multilayer depth feature target tracking method based on reliability, which utilizes the difference of the discrimination capability of different layer features in a tracking scene and adopts channel reliability to fuse the positioning information of the different layer features; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness.

In order to achieve the purpose, the invention adopts the following technical scheme: a multilayer depth feature target tracking method based on reliability comprises the following steps:

step S1, inputting picture I according to the first frame of the video sequence₁And target frame, and subsequent input picture I_t， t∈[2，N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)_w，c_h) Is a target scale;

step S2, extracting target features of the current frame by adopting the trained VGG-Net-19 network, and extracting three layers of feature description targets of conv3-4, conv4-4 and conv5-4 according to network characteristics, wherein the three layers of feature description targets are marked as f^d，d＝1,2,3；

Step S3, f^dD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map R^d，d＝1,2,3；

Step S4, solving response graph R respectively^dD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as r_max(R^d)、r_PSR(R^d)、 r_APCE(R^d)、r_RSFMP(R^d)，d＝1,2,3；

Step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is k^d，d＝1,2,3；

S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information;

step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)_w，c_h)；

Step S8, updating the target model;

and step S9, repeating the steps S2-S8 until all the frames of the current sequence are tracked to the end.

Further, the response map R is acquired in the step S3^dThe solving process of d ═ 1,2 and 3 is divided into two parts of training and detection, and the concrete steps are as follows:

step S31, f^dD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ w^d*f^d-y||²+λ||w^d||²Training filter parameters, wherein is a correlation operation, w^dTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained as

Wherein, F^dIs f^dFourier transform, (F)^d)^HIs F^dY is the Fourier transform of Y, Y^HIs the conjugate of Y;

step S32, solving for f through inverse Fourier transform^dThe response diagram is

Wherein, F^-1For inverse fourier transformation, z is a feature representation of the candidate region,

is a dot product operation.

Further, the concrete step of solving the four indexes of the response map in step S4 includes:

step S41, response diagram peak value r_max(R^d) In response to the maximum indicator of the map, r_max(R^d)＝max(R^d) The larger the peak value is, the stronger the resolution capability of the main peak is;

step S42, Peak to sidelobe ratio r_PSR(R^d) The calculation formula of (2) is as follows:

wherein, mu (R)^d) Is a response graph R^dMean value of (a), (b), (c), (d^d) Is a response graph R^dStandard deviation of (a), r_PSR(R^d) The larger the response map is, the more reliable the response map is;

step S43, average peak correlation energy r_APCE(R^d) The average peak correlation energy index of the response map is used for reflecting the average fluctuation degree of the response map and the confidence level of the detected target, and the calculation formula is

Wherein, min (R)^d) In response to the minimum value, R^d(i, j) is a response value with the coordinate position (i, j) in the response map;

step S44, minor major peak and major peak ratio r_RSFMP(R^d) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, r_RSFMP(R^d) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is r_RSFMP(R^d)＝1-min(r_peak2(R^d)/r_peak1(R^d) 0.5), wherein r_peak1(R^d) In response to the peak of the first main peak in the plot, r_peak1(R^d)＝r_max(R^d)；r_peak2(R^d) The first and second main peaks are not contiguous in response to the peak of the second main peak in the graph.

Further, the concrete steps of obtaining the reliability of different input features and normalizing in step S5 are as follows:

step S51, calculating the reliability, k, of each channel^d'＝r_max(R^d)·r_PSR(R^d)·r_APCE(R^d)·r_RSFMP(R^d)， d＝1,2,3；

Step S52, the reliability of each channel obtained by calculation is normalized, and the normalization formula is

Further, the response map merged in the step S6

And obtaining the final positioning information as the peak value of the final confidence map.

Further, in S7, a scale pool technique is used to estimate the target scale, and the scale pool is set to S ═ S₁,s₂,...,s_kI.e. for these scales s during tracking_ics_iE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:

in step S71, the scale pool is set to S ═ {0.8,0.85,0.9,0.95,1.0,1.05,1.1,1.15,1.2}, and these dimensions are estimated separately by

Selecting the size with the best performance;

step S72, to ensure the target size is stable, the size is updated smoothly, c_t+1＝(1-γ)c_t+γc_t+1And gamma is the target size learning rate, and is taken as 0.2.

Further, the updating of the target model in step S8 is to divide the correlation filter into a denominator and a numerator for updating respectively,

wherein,

is a molecular part of the molecular material,

is a part of the denominator, and the specific updating steps are as follows:

step S81, updating the molecule part,

d is 1,2, 3; wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;

step S82, updating the denominator,

d＝1,2,3。

compared with the prior art, the invention has the following advantages:

firstly, the shallow structure characteristics and the deep structure characteristics in the VGG-Net-19 network are used at the same time, so that the representation capability of the model is improved, and more accurate target positioning is realized;

secondly, the reliability of each input characteristic response graph is comprehensively measured by introducing different indexes, and the generalization capability of the model is improved;

thirdly, a scale pool technology is introduced in the target tracking process, so that the capability of a tracking model for coping with target scale change in a complex scene is improved;

fourthly, a template updating mechanism is adopted in the target tracking process, so that the robustness of a tracking model for responding to target representation change and external interference in a tracking scene is improved;

finally, through the cooperation of all links, the tracking accuracy and the success rate are better compared with the existing common comparison algorithm.

Drawings

FIG. 1 is a schematic structural view of the present invention;

FIG. 2 is a flow chart of the present invention;

FIG. 3 is a schematic diagram of the output of each layer of the VGG-Net-19 network;

FIG. 4 is a comparative simulation diagram of the integrated accuracy of the OTB100 standard data set according to the present invention and the existing common target tracking algorithm;

fig. 5 is a simulation diagram comparing the tracking success rate of the present invention and the existing common target tracking algorithm on the OTB100 standard data set.

Detailed Description

The technical solutions in the implementation of the present invention will be made clear and fully described below with reference to the accompanying drawings, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 to 5, the method for tracking a multilayer depth feature target based on reliability provided by the present invention includes the following steps:

Step S3, f^dD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map R^dD is 1,2, 3; the whole solving process is divided into two parts of training and detecting, and the specific steps are as follows:

step S31, f^dD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ w^d*f^d-y||²+λw^d||²Training filter parameters, wherein is a correlation operation, w^dTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained as

the operation is dot multiplication operation;

step S4, solving response graph R respectively^dD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as r_max(R^d)、r_PSR(R^d)、 r_APCE(R^d)、r_RSFMP(R^d) D is 1,2, 3; the concrete steps of solving the four performance indexes comprise:

step S44, minor major peak and major peak ratio r_RSFMP(R^d) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, r_RSFMP(R^d) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is r_RSFMP(R^d)＝1-min(r_peak2(R^d)/r_peak1(R^d) 0.5), wherein r_peak1(R^d) In response to the peak of the first main peak in the plot, r_peak1(R^d)＝r_max(R^d)；r_peak2(R^d) A peak in response to the second major peak in the plot, the first major peak and the second major peak being non-contiguous;

step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is k^dD is 1,2, 3; the whole process is divided into two parts of reliability solving and normalization, and the detailed steps are as follows:

S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information; wherein, the response map

Obtaining the peak value of the final positioning information R;

step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)_w，c_h) (ii) a Setting the dimension pool as S ═ S₁,s₂,...,s_k}，I.e. for these dimensions s during tracking_ic|s_iE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:

Selecting the size with the best performance;

step S72, to ensure the target size is stable, the size is updated smoothly, c_t+1＝(1-γ)c_t+γc_t+1And gamma is the target size learning rate, and is taken as 0.2;

step S8, because the wave filter training can be influenced by the target deformation, scale change, shielding and other factors, the drift phenomenon occurs, so the target model needs to be updated; specifically, the updating is to divide the correlation filter into a denominator and a numerator for updating respectively,

wherein,

is a molecular part of the molecular material,

is part of the denominator, and the detailed steps are as follows:

step S81, updating the molecule part,

step S82, updating the denominator,

d＝1,2,3；

The steps S1-S6 are a target tracking process, the step S7 is a target scale updating process, the step S8 is a tracking model updating process, the three are combined to form a complete target tracking process, in the actual target tracking process, the steps S1-S8 are repeated to complete the whole target tracking, and the position information of the target tracking is obtained from the steps S6 and S7.

In order to verify the effectiveness of the multi-layer depth Feature target Tracking method (A Multiple Deep Feature Tracking Algorithm Based on Reliability, MDPR) of the invention, the method is applied to an OTB100 data set for comparison experiment, and the comparison Algorithm mainly comprises the currently common target Tracking Algorithm, and specifically comprises the following steps:

comparison Algorithm 1, CT (Zhang K, Lei Z, Yang M H.real-Time Compressive Tracking [ C ]// Proceedings of European Conference on Computer Vision,2012: 864-877.);

comparison Algorithm 2, CSK (Henriques J F, Caseiro R, Martins P, et al. Exploiting the circular Structure of Tracking-by-Detection with Kernels [ C ]// Proceedings of European Conference on Computer Vision,2012: 702-);

comparison Algorithm 3, MOSSE _ CA (Bolme D S, Beveridge J R, Draper B A, et al. visual object tracking using adaptive correction filters [ C ]//2010IEEE Conference on Computer Vision and Pattern Recognition,2010: 2544-;

comparison Algorithm 4, SAMF (Li Y, Zhu J.A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration [ C ]// Proceedings of European Conference on Computer Vision,2014:254-

Comparison Algorithm 5, DCF _ CA (Henriques J F, Caseiro R, Martins P, et al. high-Speed transportation with Kernelized Correlation Filters [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis, 2015,37(3): 583-;

comparison algorithm 6, RPT (Li Y, Zhu J, Hoi S C H. replaceable Patch routers: Robust visual tracking by explicit updating reusable patches [ C ]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2015: 353-361.);

comparison Algorithm 7, KCF _ MTSA (Bibi A, Ghanem B. Multi-template-Adaptive Kernelized Correlation Filters [ C ]// Proceedings of IEEE International Conference on Computer Vision workstation, 2015: 613-.

Quantitative analysis is adopted in the comparison simulation experiment process, namely, the tracking performance is judged by calculating evaluation indexes. Evaluation indexes adopted in the experiment comprise tracking Precision (Precision) and tracking success rate (success rate), corresponding comparison simulation experiment results are shown in fig. 4 and fig. 5, a horizontal coordinate in fig. 4 represents a distance threshold value between a central point of a target position estimated by an algorithm and a target central point marked manually, and a vertical axis represents a ratio of frame numbers smaller than the threshold value to total frame numbers, namely prediction Precision; in fig. 5, the abscissa represents the coincidence threshold between the area of the target bounding box estimated by the algorithm and the bounding box of the artificially labeled target, and the ordinate represents the percentage of the total frames occupied by the frames smaller than the threshold, i.e., the power.

As can be seen by combining FIG. 4 and FIG. 5, the MDPR of the present invention shows better tracking accuracy and success rate on the OTB100 data set than the above comparison algorithm, the comprehensive accuracy is 79.3%, and the success rate is 75.1%.

In conclusion, in complex scenes such as illumination change, rotation change, size change and the like, the characterization capabilities of different layer features are measured by calculating the reliability of a channel, and then the positioning information of the different layer features is fused to obtain more accurate target position information; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.

Claims

1. A multilayer depth feature target tracking method based on reliability is characterized by comprising the following steps:

step S1, inputting picture I according to the first frame of the video sequence₁And target frame, and subsequent input picture I_t，t∈[2，N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)_w，c_h) Is a target scale;

Step S4, solving response graph R respectively^dD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as r_max(R^d)、r_PSR(R^d)、r_APCE(R^d)、r_RSFMP(R^d)，d＝1,2,3；

Step S8, updating the target model;

2. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the response map R is obtained in step S3^dThe solving process of d ═ 1,2 and 3 is divided into two parts of training and detection, and the concrete steps are as follows:

is a dot product operation.

3. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the solving of the four indexes of the response map in the step S4 includes:

step S41, response diagram peak value r_max(R^d) In response to the maximum indicator of the map, r_max(R^d)＝max(R^d) The larger the peak value is, the stronger the resolution of the main peak is；

4. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the step S5 of obtaining the reliability of different input features and normalizing comprises the following specific steps:

step S51, calculating the reliability, k, of each channel^d'＝r_max(R^d)·r_PSR(R^d)·r_APCE(R^d)·r_RSFMP(R^d)，d＝1,2,3；

5. The method for tracking the multilayer depth feature target based on the reliability as claimed in claim 1, wherein the fused response map in step S6

6. The method for tracking the target according to claim 1, wherein in S7, a scale pool technique is used to estimate the target scale, and the scale pool is set to S ═ { S ═ S₁,s₂,...,s_kI.e. for these scales s during tracking_ic|s_iE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:

Selecting the size with the best performance;

7. The method for tracking the target according to the claim 1, wherein the step S8 is to model the targetThe row updating is to divide the relevant filter into a denominator part and a numerator part for updating respectively,

wherein,

is a molecular part of the molecular material,

is a part of the denominator, and the specific updating steps are as follows:

step S81, updating the molecule part,

wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;

step S82, updating the denominator,