CN113077490A - Multilayer depth feature target tracking method based on reliability - Google Patents
Multilayer depth feature target tracking method based on reliability Download PDFInfo
- Publication number
- CN113077490A CN113077490A CN202110330225.3A CN202110330225A CN113077490A CN 113077490 A CN113077490 A CN 113077490A CN 202110330225 A CN202110330225 A CN 202110330225A CN 113077490 A CN113077490 A CN 113077490A
- Authority
- CN
- China
- Prior art keywords
- target
- response
- reliability
- peak
- tracking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000008569 process Effects 0.000 claims abstract description 15
- 230000004044 response Effects 0.000 claims description 68
- 238000010586 diagram Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 3
- 239000000463 material Substances 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 19
- 230000008859 change Effects 0.000 abstract description 8
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000012512 characterization method Methods 0.000 abstract description 2
- 230000010485 coping Effects 0.000 abstract description 2
- 238000004088 simulation Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 3
- WAXHXZWOUQTVQZ-UHFFFAOYSA-N 1-(1,3-benzodioxol-5-yl)-n-propylpropan-2-amine;hydrochloride Chemical compound Cl.CCCNC(C)CC1=CC=C2OCOC2=C1 WAXHXZWOUQTVQZ-UHFFFAOYSA-N 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000195940 Bryophyta Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20048—Transform domain processing
- G06T2207/20056—Discrete and fast Fourier transform, [DFT, FFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a reliability-based multilayer depth feature target tracking method, which measures the characterization capability of different layer features by calculating the reliability of a channel by utilizing the difference of the identification capability of the different layer features in a tracking scene, and further fuses the positioning information of the different layer features to obtain more accurate target position information; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness. The invention improves the representation capability of the model and realizes more accurate target positioning; the generalization capability of the model is improved; the capability of the tracking model for coping with target scale change in a complex scene is improved; robustness of a tracking model for responding to target representation change and external interference in a tracking scene is improved; compared with the existing common comparison algorithm, the method has the advantages of higher tracking precision and success rate.
Description
Technical Field
The invention relates to a multilayer depth characteristic target tracking method based on reliability, and belongs to the technical field of network communication.
Background
Target tracking is a research hotspot in the field of computers, and has wide application in the fields of video monitoring, human-computer interaction, intelligent transportation and the like. The object of the visual tracking task is to detect a continuously moving object in an image sequence, obtain the motion information of the object, further extract the motion track of the object, and analyze the motion of the object, thereby realizing the understanding of the motion behavior of the object. Due to the diversity and complexity of tracking scenes, the existing target tracking algorithm is still inaccurate in distinguishing and positioning the target, and the method has very important research significance on further improving the performance of the existing target tracking algorithm.
The shallow layer features are mainly concentrated on low-layer information, such as shapes, textures, colors and the like, and have great influence on positioning accuracy; the deep features have rich semantic information, have stronger robustness such as deformation, motion blur and the like when dealing with complex tracking scenes, but lose more spatial details due to low resolution. Therefore, how to comprehensively utilize shallow and deep information in a complex tracking scene is a problem which needs to be solved urgently.
The prior art has the defects that: the traditional target tracking method based on the correlation filtering uses a single feature to represent a target, so that the expression force of the feature on the current tracking target is insufficient, and particularly when similar targets interfere in the background, a tracker cannot distinguish the targets, and even the tracking fails.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a multilayer depth feature target tracking method based on reliability, which utilizes the difference of the discrimination capability of different layer features in a tracking scene and adopts channel reliability to fuse the positioning information of the different layer features; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness.
In order to achieve the purpose, the invention adopts the following technical scheme: a multilayer depth feature target tracking method based on reliability comprises the following steps:
step S1, inputting picture I according to the first frame of the video sequence1And target frame, and subsequent input picture It, t∈[2,N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)w,ch) Is a target scale;
step S2, extracting target features of the current frame by adopting the trained VGG-Net-19 network, and extracting three layers of feature description targets of conv3-4, conv4-4 and conv5-4 according to network characteristics, wherein the three layers of feature description targets are marked as fd,d=1,2,3;
Step S3, fdD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map Rd,d=1,2,3;
Step S4, solving response graph R respectivelydD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as rmax(Rd)、rPSR(Rd)、 rAPCE(Rd)、rRSFMP(Rd),d=1,2,3;
Step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is kd,d=1,2,3;
S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information;
step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)w,ch);
Step S8, updating the target model;
and step S9, repeating the steps S2-S8 until all the frames of the current sequence are tracked to the end.
Further, the response map R is acquired in the step S3dThe solving process of d ═ 1,2 and 3 is divided into two parts of training and detection, and the concrete steps are as follows:
step S31, fdD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ wd*fd-y||2+λ||wd||2Training filter parameters, wherein is a correlation operation, wdTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained asWherein, FdIs fdFourier transform, (F)d)HIs FdY is the Fourier transform of Y, YHIs the conjugate of Y;
step S32, solving for f through inverse Fourier transformdThe response diagram isWherein, F-1For inverse fourier transformation, z is a feature representation of the candidate region,is a dot product operation.
Further, the concrete step of solving the four indexes of the response map in step S4 includes:
step S41, response diagram peak value rmax(Rd) In response to the maximum indicator of the map, rmax(Rd)=max(Rd) The larger the peak value is, the stronger the resolution capability of the main peak is;
step S42, Peak to sidelobe ratio rPSR(Rd) The calculation formula of (2) is as follows:wherein, mu (R)d) Is a response graph RdMean value of (a), (b), (c), (dd) Is a response graph RdStandard deviation of (a), rPSR(Rd) The larger the response map is, the more reliable the response map is;
step S43, average peak correlation energy rAPCE(Rd) The average peak correlation energy index of the response map is used for reflecting the average fluctuation degree of the response map and the confidence level of the detected target, and the calculation formula isWherein, min (R)d) In response to the minimum value, Rd(i, j) is a response value with the coordinate position (i, j) in the response map;
step S44, minor major peak and major peak ratio rRSFMP(Rd) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, rRSFMP(Rd) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is rRSFMP(Rd)=1-min(rpeak2(Rd)/rpeak1(Rd) 0.5), wherein rpeak1(Rd) In response to the peak of the first main peak in the plot, rpeak1(Rd)=rmax(Rd);rpeak2(Rd) The first and second main peaks are not contiguous in response to the peak of the second main peak in the graph.
Further, the concrete steps of obtaining the reliability of different input features and normalizing in step S5 are as follows:
step S51, calculating the reliability, k, of each channeld'=rmax(Rd)·rPSR(Rd)·rAPCE(Rd)·rRSFMP(Rd), d=1,2,3;
Step S52, the reliability of each channel obtained by calculation is normalized, and the normalization formula is
Further, the response map merged in the step S6And obtaining the final positioning information as the peak value of the final confidence map.
Further, in S7, a scale pool technique is used to estimate the target scale, and the scale pool is set to S ═ S1,s2,...,skI.e. for these scales s during trackingicsiE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:
in step S71, the scale pool is set to S ═ {0.8,0.85,0.9,0.95,1.0,1.05,1.1,1.15,1.2}, and these dimensions are estimated separately bySelecting the size with the best performance;
step S72, to ensure the target size is stable, the size is updated smoothly, ct+1=(1-γ)ct+γct+1And gamma is the target size learning rate, and is taken as 0.2.
Further, the updating of the target model in step S8 is to divide the correlation filter into a denominator and a numerator for updating respectively,wherein,is a molecular part of the molecular material,is a part of the denominator, and the specific updating steps are as follows:
step S81, updating the molecule part,d is 1,2, 3; wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;
compared with the prior art, the invention has the following advantages:
firstly, the shallow structure characteristics and the deep structure characteristics in the VGG-Net-19 network are used at the same time, so that the representation capability of the model is improved, and more accurate target positioning is realized;
secondly, the reliability of each input characteristic response graph is comprehensively measured by introducing different indexes, and the generalization capability of the model is improved;
thirdly, a scale pool technology is introduced in the target tracking process, so that the capability of a tracking model for coping with target scale change in a complex scene is improved;
fourthly, a template updating mechanism is adopted in the target tracking process, so that the robustness of a tracking model for responding to target representation change and external interference in a tracking scene is improved;
finally, through the cooperation of all links, the tracking accuracy and the success rate are better compared with the existing common comparison algorithm.
Drawings
FIG. 1 is a schematic structural view of the present invention;
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a schematic diagram of the output of each layer of the VGG-Net-19 network;
FIG. 4 is a comparative simulation diagram of the integrated accuracy of the OTB100 standard data set according to the present invention and the existing common target tracking algorithm;
fig. 5 is a simulation diagram comparing the tracking success rate of the present invention and the existing common target tracking algorithm on the OTB100 standard data set.
Detailed Description
The technical solutions in the implementation of the present invention will be made clear and fully described below with reference to the accompanying drawings, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 to 5, the method for tracking a multilayer depth feature target based on reliability provided by the present invention includes the following steps:
step S1, inputting picture I according to the first frame of the video sequence1And target frame, and subsequent input picture It, t∈[2,N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)w,ch) Is a target scale;
step S2, extracting target features of the current frame by adopting the trained VGG-Net-19 network, and extracting three layers of feature description targets of conv3-4, conv4-4 and conv5-4 according to network characteristics, wherein the three layers of feature description targets are marked as fd,d=1,2,3;
Step S3, fdD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map RdD is 1,2, 3; the whole solving process is divided into two parts of training and detecting, and the specific steps are as follows:
step S31, fdD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ wd*fd-y||2+λwd||2Training filter parameters, wherein is a correlation operation, wdTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained asWherein, FdIs fdFourier transform, (F)d)HIs FdY is the Fourier transform of Y, YHIs the conjugate of Y;
step S32, solving for f through inverse Fourier transformdThe response diagram isWherein, F-1For inverse fourier transformation, z is a feature representation of the candidate region,the operation is dot multiplication operation;
step S4, solving response graph R respectivelydD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as rmax(Rd)、rPSR(Rd)、 rAPCE(Rd)、rRSFMP(Rd) D is 1,2, 3; the concrete steps of solving the four performance indexes comprise:
step S41, response diagram peak value rmax(Rd) In response to the maximum indicator of the map, rmax(Rd)=max(Rd) The larger the peak value is, the stronger the resolution capability of the main peak is;
step S42, Peak to sidelobe ratio rPSR(Rd) The calculation formula of (2) is as follows:wherein, mu (R)d) Is a response graph RdMean value of (a), (b), (c), (dd) Is a response graph RdStandard deviation of (a), rPSR(Rd) The larger the response map is, the more reliable the response map is;
step S43, average peak correlation energy rAPCE(Rd) The average peak correlation energy index of the response map is used for reflecting the average fluctuation degree of the response map and the confidence level of the detected target, and the calculation formula isWherein, min (R)d) In response to the minimum value, Rd(i, j) is a response value with the coordinate position (i, j) in the response map;
step S44, minor major peak and major peak ratio rRSFMP(Rd) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, rRSFMP(Rd) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is rRSFMP(Rd)=1-min(rpeak2(Rd)/rpeak1(Rd) 0.5), wherein rpeak1(Rd) In response to the peak of the first main peak in the plot, rpeak1(Rd)=rmax(Rd);rpeak2(Rd) A peak in response to the second major peak in the plot, the first major peak and the second major peak being non-contiguous;
step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is kdD is 1,2, 3; the whole process is divided into two parts of reliability solving and normalization, and the detailed steps are as follows:
step S51, calculating the reliability, k, of each channeld'=rmax(Rd)·rPSR(Rd)·rAPCE(Rd)·rRSFMP(Rd), d=1,2,3;
Step S52, the reliability of each channel obtained by calculation is normalized, and the normalization formula is
S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information; wherein, the response mapObtaining the peak value of the final positioning information R;
step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)w,ch) (ii) a Setting the dimension pool as S ═ S1,s2,...,sk},I.e. for these dimensions s during trackingic|siE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:
in step S71, the scale pool is set to S ═ {0.8,0.85,0.9,0.95,1.0,1.05,1.1,1.15,1.2}, and these dimensions are estimated separately bySelecting the size with the best performance;
step S72, to ensure the target size is stable, the size is updated smoothly, ct+1=(1-γ)ct+γct+1And gamma is the target size learning rate, and is taken as 0.2;
step S8, because the wave filter training can be influenced by the target deformation, scale change, shielding and other factors, the drift phenomenon occurs, so the target model needs to be updated; specifically, the updating is to divide the correlation filter into a denominator and a numerator for updating respectively,wherein,is a molecular part of the molecular material,is part of the denominator, and the detailed steps are as follows:
step S81, updating the molecule part,d is 1,2, 3; wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;
and step S9, repeating the steps S2-S8 until all the frames of the current sequence are tracked to the end.
The steps S1-S6 are a target tracking process, the step S7 is a target scale updating process, the step S8 is a tracking model updating process, the three are combined to form a complete target tracking process, in the actual target tracking process, the steps S1-S8 are repeated to complete the whole target tracking, and the position information of the target tracking is obtained from the steps S6 and S7.
In order to verify the effectiveness of the multi-layer depth Feature target Tracking method (A Multiple Deep Feature Tracking Algorithm Based on Reliability, MDPR) of the invention, the method is applied to an OTB100 data set for comparison experiment, and the comparison Algorithm mainly comprises the currently common target Tracking Algorithm, and specifically comprises the following steps:
comparison Algorithm 2, CSK (Henriques J F, Caseiro R, Martins P, et al. Exploiting the circular Structure of Tracking-by-Detection with Kernels [ C ]// Proceedings of European Conference on Computer Vision,2012: 702-);
comparison Algorithm 3, MOSSE _ CA (Bolme D S, Beveridge J R, Draper B A, et al. visual object tracking using adaptive correction filters [ C ]//2010IEEE Conference on Computer Vision and Pattern Recognition,2010: 2544-;
comparison Algorithm 4, SAMF (Li Y, Zhu J.A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration [ C ]// Proceedings of European Conference on Computer Vision,2014:254-
Comparison Algorithm 5, DCF _ CA (Henriques J F, Caseiro R, Martins P, et al. high-Speed transportation with Kernelized Correlation Filters [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis, 2015,37(3): 583-;
comparison algorithm 6, RPT (Li Y, Zhu J, Hoi S C H. replaceable Patch routers: Robust visual tracking by explicit updating reusable patches [ C ]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2015: 353-361.);
comparison Algorithm 7, KCF _ MTSA (Bibi A, Ghanem B. Multi-template-Adaptive Kernelized Correlation Filters [ C ]// Proceedings of IEEE International Conference on Computer Vision workstation, 2015: 613-.
Quantitative analysis is adopted in the comparison simulation experiment process, namely, the tracking performance is judged by calculating evaluation indexes. Evaluation indexes adopted in the experiment comprise tracking Precision (Precision) and tracking success rate (success rate), corresponding comparison simulation experiment results are shown in fig. 4 and fig. 5, a horizontal coordinate in fig. 4 represents a distance threshold value between a central point of a target position estimated by an algorithm and a target central point marked manually, and a vertical axis represents a ratio of frame numbers smaller than the threshold value to total frame numbers, namely prediction Precision; in fig. 5, the abscissa represents the coincidence threshold between the area of the target bounding box estimated by the algorithm and the bounding box of the artificially labeled target, and the ordinate represents the percentage of the total frames occupied by the frames smaller than the threshold, i.e., the power.
As can be seen by combining FIG. 4 and FIG. 5, the MDPR of the present invention shows better tracking accuracy and success rate on the OTB100 data set than the above comparison algorithm, the comprehensive accuracy is 79.3%, and the success rate is 75.1%.
In conclusion, in complex scenes such as illumination change, rotation change, size change and the like, the characterization capabilities of different layer features are measured by calculating the reliability of a channel, and then the positioning information of the different layer features is fused to obtain more accurate target position information; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.
Claims (7)
1. A multilayer depth feature target tracking method based on reliability is characterized by comprising the following steps:
step S1, inputting picture I according to the first frame of the video sequence1And target frame, and subsequent input picture It,t∈[2,N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)w,ch) Is a target scale;
step S2, extracting target features of the current frame by adopting the trained VGG-Net-19 network, and extracting three layers of feature description targets of conv3-4, conv4-4 and conv5-4 according to network characteristics, wherein the three layers of feature description targets are marked as fd,d=1,2,3;
Step S3, fdD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map Rd,d=1,2,3;
Step S4, solving response graph R respectivelydD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as rmax(Rd)、rPSR(Rd)、rAPCE(Rd)、rRSFMP(Rd),d=1,2,3;
Step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is kd,d=1,2,3;
S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information;
step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)w,ch);
Step S8, updating the target model;
and step S9, repeating the steps S2-S8 until all the frames of the current sequence are tracked to the end.
2. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the response map R is obtained in step S3dThe solving process of d ═ 1,2 and 3 is divided into two parts of training and detection, and the concrete steps are as follows:
step S31, fdD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ wd*fd-y||2+λ||wd||2Training filter parameters, wherein is a correlation operation, wdTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained asWherein, FdIs fdFourier transform, (F)d)HIs FdY is the Fourier transform of Y, YHIs the conjugate of Y;
3. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the solving of the four indexes of the response map in the step S4 includes:
step S41, response diagram peak value rmax(Rd) In response to the maximum indicator of the map, rmax(Rd)=max(Rd) The larger the peak value is, the stronger the resolution of the main peak is;
Step S42, Peak to sidelobe ratio rPSR(Rd) The calculation formula of (2) is as follows:wherein, mu (R)d) Is a response graph RdMean value of (a), (b), (c), (dd) Is a response graph RdStandard deviation of (a), rPSR(Rd) The larger the response map is, the more reliable the response map is;
step S43, average peak correlation energy rAPCE(Rd) The average peak correlation energy index of the response map is used for reflecting the average fluctuation degree of the response map and the confidence level of the detected target, and the calculation formula isWherein, min (R)d) In response to the minimum value, Rd(i, j) is a response value with the coordinate position (i, j) in the response map;
step S44, minor major peak and major peak ratio rRSFMP(Rd) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, rRSFMP(Rd) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is rRSFMP(Rd)=1-min(rpeak2(Rd)/rpeak1(Rd) 0.5), wherein rpeak1(Rd) In response to the peak of the first main peak in the plot, rpeak1(Rd)=rmax(Rd);rpeak2(Rd) The first and second main peaks are not contiguous in response to the peak of the second main peak in the graph.
4. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the step S5 of obtaining the reliability of different input features and normalizing comprises the following specific steps:
step S51, calculating the reliability, k, of each channeld'=rmax(Rd)·rPSR(Rd)·rAPCE(Rd)·rRSFMP(Rd),d=1,2,3;
6. The method for tracking the target according to claim 1, wherein in S7, a scale pool technique is used to estimate the target scale, and the scale pool is set to S ═ { S ═ S1,s2,...,skI.e. for these scales s during trackingic|siE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:
in step S71, the scale pool is set to S ═ {0.8,0.85,0.9,0.95,1.0,1.05,1.1,1.15,1.2}, and these dimensions are estimated separately bySelecting the size with the best performance;
step S72, to ensure the target size is stable, the size is updated smoothly, ct+1=(1-γ)ct+γct+1And gamma is the target size learning rate, and is taken as 0.2.
7. The method for tracking the target according to the claim 1, wherein the step S8 is to model the targetThe row updating is to divide the relevant filter into a denominator part and a numerator part for updating respectively,wherein,is a molecular part of the molecular material,is a part of the denominator, and the specific updating steps are as follows:
step S81, updating the molecule part,wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110330225.3A CN113077490A (en) | 2021-03-29 | 2021-03-29 | Multilayer depth feature target tracking method based on reliability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110330225.3A CN113077490A (en) | 2021-03-29 | 2021-03-29 | Multilayer depth feature target tracking method based on reliability |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113077490A true CN113077490A (en) | 2021-07-06 |
Family
ID=76611161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110330225.3A Pending CN113077490A (en) | 2021-03-29 | 2021-03-29 | Multilayer depth feature target tracking method based on reliability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113077490A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114359330A (en) * | 2021-11-01 | 2022-04-15 | 中国人民解放军陆军工程大学 | Long-term target tracking method and system fusing depth information |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943837A (en) * | 2017-10-27 | 2018-04-20 | 江苏理工学院 | A kind of video abstraction generating method of foreground target key frame |
CN109410247A (en) * | 2018-10-16 | 2019-03-01 | 中国石油大学(华东) | A kind of video tracking algorithm of multi-template and adaptive features select |
CN111311647A (en) * | 2020-01-17 | 2020-06-19 | 长沙理工大学 | Target tracking method and device based on global-local and Kalman filtering |
-
2021
- 2021-03-29 CN CN202110330225.3A patent/CN113077490A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943837A (en) * | 2017-10-27 | 2018-04-20 | 江苏理工学院 | A kind of video abstraction generating method of foreground target key frame |
CN109410247A (en) * | 2018-10-16 | 2019-03-01 | 中国石油大学(华东) | A kind of video tracking algorithm of multi-template and adaptive features select |
CN111311647A (en) * | 2020-01-17 | 2020-06-19 | 长沙理工大学 | Target tracking method and device based on global-local and Kalman filtering |
Non-Patent Citations (4)
Title |
---|
小小菜鸟一只: "目标跟踪算法——HCF:Hierarchical Convolutional Features for Visual Tracking", pages 1 - 6, Retrieved from the Internet <URL:https://blog.csdn.net/crazyice521/article/details/65935753> * |
尹明锋 等: "基于加权时空上下文学习的多特征视觉跟踪", 《中国惯性技术学报》, vol. 27, no. 1, pages 43 - 50 * |
尹明锋 等: "基于改进的空间直方图相似性度量的粒子滤波视觉跟踪(英文)", 《中国惯性技术学报》, vol. 26, no. 3, pages 359 - 365 * |
尹明锋 等: "基于通道可靠性的多尺度背景感知相关滤波跟踪算法", 《光学学报》, vol. 39, no. 5, pages 1 - 11 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114359330A (en) * | 2021-11-01 | 2022-04-15 | 中国人民解放军陆军工程大学 | Long-term target tracking method and system fusing depth information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363122B (en) | Cross-domain target detection method based on multi-layer feature alignment | |
CN108921873B (en) | Markov decision-making online multi-target tracking method based on kernel correlation filtering optimization | |
CN104200495B (en) | A kind of multi-object tracking method in video monitoring | |
CN110490913B (en) | Image matching method based on feature description operator of corner and single line segment grouping | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN111553425B (en) | Template matching LSP algorithm, medium and equipment for visual positioning | |
CN111582349B (en) | Improved target tracking algorithm based on YOLOv3 and kernel correlation filtering | |
CN108197604A (en) | Fast face positioning and tracing method based on embedded device | |
CN111612817A (en) | Target tracking method based on depth feature adaptive fusion and context information | |
US9129152B2 (en) | Exemplar-based feature weighting | |
CA3136674C (en) | Methods and systems for crack detection using a fully convolutional network | |
CN111429485B (en) | Cross-modal filtering tracking method based on self-adaptive regularization and high-reliability updating | |
CN105279772A (en) | Trackability distinguishing method of infrared sequence image | |
CN111640138A (en) | Target tracking method, device, equipment and storage medium | |
CN108446613A (en) | A kind of pedestrian's recognition methods again based on distance centerization and projection vector study | |
CN112613565B (en) | Anti-occlusion tracking method based on multi-feature fusion and adaptive learning rate updating | |
CN106557740A (en) | The recognition methods of oil depot target in a kind of remote sensing images | |
CN105894037A (en) | Whole supervision and classification method of remote sensing images extracted based on SIFT training samples | |
CN117437406A (en) | Multi-target detection method and device | |
CN109766748B (en) | Pedestrian re-recognition method based on projection transformation and dictionary learning | |
CN115953371A (en) | Insulator defect detection method, device, equipment and storage medium | |
CN106447662A (en) | Combined distance based FCM image segmentation algorithm | |
CN108549905A (en) | A kind of accurate method for tracking target under serious circumstance of occlusion | |
CN109508674B (en) | Airborne downward-looking heterogeneous image matching method based on region division | |
CN110827327B (en) | Fusion-based long-term target tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210706 |