[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113077490A - Multilayer depth feature target tracking method based on reliability - Google Patents

Multilayer depth feature target tracking method based on reliability Download PDF

Info

Publication number
CN113077490A
CN113077490A CN202110330225.3A CN202110330225A CN113077490A CN 113077490 A CN113077490 A CN 113077490A CN 202110330225 A CN202110330225 A CN 202110330225A CN 113077490 A CN113077490 A CN 113077490A
Authority
CN
China
Prior art keywords
target
response
reliability
peak
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110330225.3A
Other languages
Chinese (zh)
Inventor
尹明锋
周文娟
游丽萍
花旭
陈昌凯
金圣昕
周林苇
贝绍轶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu China Israel Industrial Technology Research Institute
Jiangsu University of Technology
Original Assignee
Jiangsu China Israel Industrial Technology Research Institute
Jiangsu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu China Israel Industrial Technology Research Institute, Jiangsu University of Technology filed Critical Jiangsu China Israel Industrial Technology Research Institute
Priority to CN202110330225.3A priority Critical patent/CN113077490A/en
Publication of CN113077490A publication Critical patent/CN113077490A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a reliability-based multilayer depth feature target tracking method, which measures the characterization capability of different layer features by calculating the reliability of a channel by utilizing the difference of the identification capability of the different layer features in a tracking scene, and further fuses the positioning information of the different layer features to obtain more accurate target position information; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness. The invention improves the representation capability of the model and realizes more accurate target positioning; the generalization capability of the model is improved; the capability of the tracking model for coping with target scale change in a complex scene is improved; robustness of a tracking model for responding to target representation change and external interference in a tracking scene is improved; compared with the existing common comparison algorithm, the method has the advantages of higher tracking precision and success rate.

Description

Multilayer depth feature target tracking method based on reliability
Technical Field
The invention relates to a multilayer depth characteristic target tracking method based on reliability, and belongs to the technical field of network communication.
Background
Target tracking is a research hotspot in the field of computers, and has wide application in the fields of video monitoring, human-computer interaction, intelligent transportation and the like. The object of the visual tracking task is to detect a continuously moving object in an image sequence, obtain the motion information of the object, further extract the motion track of the object, and analyze the motion of the object, thereby realizing the understanding of the motion behavior of the object. Due to the diversity and complexity of tracking scenes, the existing target tracking algorithm is still inaccurate in distinguishing and positioning the target, and the method has very important research significance on further improving the performance of the existing target tracking algorithm.
The shallow layer features are mainly concentrated on low-layer information, such as shapes, textures, colors and the like, and have great influence on positioning accuracy; the deep features have rich semantic information, have stronger robustness such as deformation, motion blur and the like when dealing with complex tracking scenes, but lose more spatial details due to low resolution. Therefore, how to comprehensively utilize shallow and deep information in a complex tracking scene is a problem which needs to be solved urgently.
The prior art has the defects that: the traditional target tracking method based on the correlation filtering uses a single feature to represent a target, so that the expression force of the feature on the current tracking target is insufficient, and particularly when similar targets interfere in the background, a tracker cannot distinguish the targets, and even the tracking fails.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a multilayer depth feature target tracking method based on reliability, which utilizes the difference of the discrimination capability of different layer features in a tracking scene and adopts channel reliability to fuse the positioning information of the different layer features; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness.
In order to achieve the purpose, the invention adopts the following technical scheme: a multilayer depth feature target tracking method based on reliability comprises the following steps:
step S1, inputting picture I according to the first frame of the video sequence1And target frame, and subsequent input picture It, t∈[2,N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)w,ch) Is a target scale;
step S2, extracting target features of the current frame by adopting the trained VGG-Net-19 network, and extracting three layers of feature description targets of conv3-4, conv4-4 and conv5-4 according to network characteristics, wherein the three layers of feature description targets are marked as fd,d=1,2,3;
Step S3, fdD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map Rd,d=1,2,3;
Step S4, solving response graph R respectivelydD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as rmax(Rd)、rPSR(Rd)、 rAPCE(Rd)、rRSFMP(Rd),d=1,2,3;
Step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is kd,d=1,2,3;
S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information;
step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)w,ch);
Step S8, updating the target model;
and step S9, repeating the steps S2-S8 until all the frames of the current sequence are tracked to the end.
Further, the response map R is acquired in the step S3dThe solving process of d ═ 1,2 and 3 is divided into two parts of training and detection, and the concrete steps are as follows:
step S31, fdD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ wd*fd-y||2+λ||wd||2Training filter parameters, wherein is a correlation operation, wdTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained as
Figure BDA0002996075720000031
Wherein, FdIs fdFourier transform, (F)d)HIs FdY is the Fourier transform of Y, YHIs the conjugate of Y;
step S32, solving for f through inverse Fourier transformdThe response diagram is
Figure BDA0002996075720000032
Wherein, F-1For inverse fourier transformation, z is a feature representation of the candidate region,
Figure BDA0002996075720000033
is a dot product operation.
Further, the concrete step of solving the four indexes of the response map in step S4 includes:
step S41, response diagram peak value rmax(Rd) In response to the maximum indicator of the map, rmax(Rd)=max(Rd) The larger the peak value is, the stronger the resolution capability of the main peak is;
step S42, Peak to sidelobe ratio rPSR(Rd) The calculation formula of (2) is as follows:
Figure BDA0002996075720000034
wherein, mu (R)d) Is a response graph RdMean value of (a), (b), (c), (dd) Is a response graph RdStandard deviation of (a), rPSR(Rd) The larger the response map is, the more reliable the response map is;
step S43, average peak correlation energy rAPCE(Rd) The average peak correlation energy index of the response map is used for reflecting the average fluctuation degree of the response map and the confidence level of the detected target, and the calculation formula is
Figure 100002_1
Wherein, min (R)d) In response to the minimum value, Rd(i, j) is a response value with the coordinate position (i, j) in the response map;
step S44, minor major peak and major peak ratio rRSFMP(Rd) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, rRSFMP(Rd) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is rRSFMP(Rd)=1-min(rpeak2(Rd)/rpeak1(Rd) 0.5), wherein rpeak1(Rd) In response to the peak of the first main peak in the plot, rpeak1(Rd)=rmax(Rd);rpeak2(Rd) The first and second main peaks are not contiguous in response to the peak of the second main peak in the graph.
Further, the concrete steps of obtaining the reliability of different input features and normalizing in step S5 are as follows:
step S51, calculating the reliability, k, of each channeld'=rmax(Rd)·rPSR(Rd)·rAPCE(Rd)·rRSFMP(Rd), d=1,2,3;
Step S52, the reliability of each channel obtained by calculation is normalized, and the normalization formula is
Figure BDA0002996075720000041
Further, the response map merged in the step S6
Figure BDA0002996075720000042
And obtaining the final positioning information as the peak value of the final confidence map.
Further, in S7, a scale pool technique is used to estimate the target scale, and the scale pool is set to S ═ S1,s2,...,skI.e. for these scales s during trackingicsiE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:
in step S71, the scale pool is set to S ═ {0.8,0.85,0.9,0.95,1.0,1.05,1.1,1.15,1.2}, and these dimensions are estimated separately by
Figure BDA0002996075720000043
Selecting the size with the best performance;
step S72, to ensure the target size is stable, the size is updated smoothly, ct+1=(1-γ)ct+γct+1And gamma is the target size learning rate, and is taken as 0.2.
Further, the updating of the target model in step S8 is to divide the correlation filter into a denominator and a numerator for updating respectively,
Figure BDA0002996075720000044
wherein,
Figure BDA0002996075720000045
is a molecular part of the molecular material,
Figure BDA0002996075720000046
is a part of the denominator, and the specific updating steps are as follows:
step S81, updating the molecule part,
Figure BDA0002996075720000047
d is 1,2, 3; wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;
step S82, updating the denominator,
Figure BDA0002996075720000048
d=1,2,3。
compared with the prior art, the invention has the following advantages:
firstly, the shallow structure characteristics and the deep structure characteristics in the VGG-Net-19 network are used at the same time, so that the representation capability of the model is improved, and more accurate target positioning is realized;
secondly, the reliability of each input characteristic response graph is comprehensively measured by introducing different indexes, and the generalization capability of the model is improved;
thirdly, a scale pool technology is introduced in the target tracking process, so that the capability of a tracking model for coping with target scale change in a complex scene is improved;
fourthly, a template updating mechanism is adopted in the target tracking process, so that the robustness of a tracking model for responding to target representation change and external interference in a tracking scene is improved;
finally, through the cooperation of all links, the tracking accuracy and the success rate are better compared with the existing common comparison algorithm.
Drawings
FIG. 1 is a schematic structural view of the present invention;
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a schematic diagram of the output of each layer of the VGG-Net-19 network;
FIG. 4 is a comparative simulation diagram of the integrated accuracy of the OTB100 standard data set according to the present invention and the existing common target tracking algorithm;
fig. 5 is a simulation diagram comparing the tracking success rate of the present invention and the existing common target tracking algorithm on the OTB100 standard data set.
Detailed Description
The technical solutions in the implementation of the present invention will be made clear and fully described below with reference to the accompanying drawings, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 to 5, the method for tracking a multilayer depth feature target based on reliability provided by the present invention includes the following steps:
step S1, inputting picture I according to the first frame of the video sequence1And target frame, and subsequent input picture It, t∈[2,N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)w,ch) Is a target scale;
step S2, extracting target features of the current frame by adopting the trained VGG-Net-19 network, and extracting three layers of feature description targets of conv3-4, conv4-4 and conv5-4 according to network characteristics, wherein the three layers of feature description targets are marked as fd,d=1,2,3;
Step S3, fdD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map RdD is 1,2, 3; the whole solving process is divided into two parts of training and detecting, and the specific steps are as follows:
step S31, fdD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ wd*fd-y||2+λwd||2Training filter parameters, wherein is a correlation operation, wdTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained as
Figure BDA0002996075720000061
Wherein, FdIs fdFourier transform, (F)d)HIs FdY is the Fourier transform of Y, YHIs the conjugate of Y;
step S32, solving for f through inverse Fourier transformdThe response diagram is
Figure BDA0002996075720000062
Wherein, F-1For inverse fourier transformation, z is a feature representation of the candidate region,
Figure BDA0002996075720000063
the operation is dot multiplication operation;
step S4, solving response graph R respectivelydD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as rmax(Rd)、rPSR(Rd)、 rAPCE(Rd)、rRSFMP(Rd) D is 1,2, 3; the concrete steps of solving the four performance indexes comprise:
step S41, response diagram peak value rmax(Rd) In response to the maximum indicator of the map, rmax(Rd)=max(Rd) The larger the peak value is, the stronger the resolution capability of the main peak is;
step S42, Peak to sidelobe ratio rPSR(Rd) The calculation formula of (2) is as follows:
Figure BDA0002996075720000064
wherein, mu (R)d) Is a response graph RdMean value of (a), (b), (c), (dd) Is a response graph RdStandard deviation of (a), rPSR(Rd) The larger the response map is, the more reliable the response map is;
step S43, average peak correlation energy rAPCE(Rd) The average peak correlation energy index of the response map is used for reflecting the average fluctuation degree of the response map and the confidence level of the detected target, and the calculation formula is
Figure 2
Wherein, min (R)d) In response to the minimum value, Rd(i, j) is a response value with the coordinate position (i, j) in the response map;
step S44, minor major peak and major peak ratio rRSFMP(Rd) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, rRSFMP(Rd) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is rRSFMP(Rd)=1-min(rpeak2(Rd)/rpeak1(Rd) 0.5), wherein rpeak1(Rd) In response to the peak of the first main peak in the plot, rpeak1(Rd)=rmax(Rd);rpeak2(Rd) A peak in response to the second major peak in the plot, the first major peak and the second major peak being non-contiguous;
step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is kdD is 1,2, 3; the whole process is divided into two parts of reliability solving and normalization, and the detailed steps are as follows:
step S51, calculating the reliability, k, of each channeld'=rmax(Rd)·rPSR(Rd)·rAPCE(Rd)·rRSFMP(Rd), d=1,2,3;
Step S52, the reliability of each channel obtained by calculation is normalized, and the normalization formula is
Figure BDA0002996075720000071
S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information; wherein, the response map
Figure BDA0002996075720000072
Obtaining the peak value of the final positioning information R;
step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)w,ch) (ii) a Setting the dimension pool as S ═ S1,s2,...,sk},I.e. for these dimensions s during trackingic|siE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:
in step S71, the scale pool is set to S ═ {0.8,0.85,0.9,0.95,1.0,1.05,1.1,1.15,1.2}, and these dimensions are estimated separately by
Figure BDA0002996075720000073
Selecting the size with the best performance;
step S72, to ensure the target size is stable, the size is updated smoothly, ct+1=(1-γ)ct+γct+1And gamma is the target size learning rate, and is taken as 0.2;
step S8, because the wave filter training can be influenced by the target deformation, scale change, shielding and other factors, the drift phenomenon occurs, so the target model needs to be updated; specifically, the updating is to divide the correlation filter into a denominator and a numerator for updating respectively,
Figure BDA0002996075720000081
wherein,
Figure BDA0002996075720000082
is a molecular part of the molecular material,
Figure BDA0002996075720000083
is part of the denominator, and the detailed steps are as follows:
step S81, updating the molecule part,
Figure BDA0002996075720000084
d is 1,2, 3; wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;
step S82, updating the denominator,
Figure BDA0002996075720000085
d=1,2,3;
and step S9, repeating the steps S2-S8 until all the frames of the current sequence are tracked to the end.
The steps S1-S6 are a target tracking process, the step S7 is a target scale updating process, the step S8 is a tracking model updating process, the three are combined to form a complete target tracking process, in the actual target tracking process, the steps S1-S8 are repeated to complete the whole target tracking, and the position information of the target tracking is obtained from the steps S6 and S7.
In order to verify the effectiveness of the multi-layer depth Feature target Tracking method (A Multiple Deep Feature Tracking Algorithm Based on Reliability, MDPR) of the invention, the method is applied to an OTB100 data set for comparison experiment, and the comparison Algorithm mainly comprises the currently common target Tracking Algorithm, and specifically comprises the following steps:
comparison Algorithm 1, CT (Zhang K, Lei Z, Yang M H.real-Time Compressive Tracking [ C ]// Proceedings of European Conference on Computer Vision,2012: 864-877.);
comparison Algorithm 2, CSK (Henriques J F, Caseiro R, Martins P, et al. Exploiting the circular Structure of Tracking-by-Detection with Kernels [ C ]// Proceedings of European Conference on Computer Vision,2012: 702-);
comparison Algorithm 3, MOSSE _ CA (Bolme D S, Beveridge J R, Draper B A, et al. visual object tracking using adaptive correction filters [ C ]//2010IEEE Conference on Computer Vision and Pattern Recognition,2010: 2544-;
comparison Algorithm 4, SAMF (Li Y, Zhu J.A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration [ C ]// Proceedings of European Conference on Computer Vision,2014:254-
Comparison Algorithm 5, DCF _ CA (Henriques J F, Caseiro R, Martins P, et al. high-Speed transportation with Kernelized Correlation Filters [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis, 2015,37(3): 583-;
comparison algorithm 6, RPT (Li Y, Zhu J, Hoi S C H. replaceable Patch routers: Robust visual tracking by explicit updating reusable patches [ C ]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2015: 353-361.);
comparison Algorithm 7, KCF _ MTSA (Bibi A, Ghanem B. Multi-template-Adaptive Kernelized Correlation Filters [ C ]// Proceedings of IEEE International Conference on Computer Vision workstation, 2015: 613-.
Quantitative analysis is adopted in the comparison simulation experiment process, namely, the tracking performance is judged by calculating evaluation indexes. Evaluation indexes adopted in the experiment comprise tracking Precision (Precision) and tracking success rate (success rate), corresponding comparison simulation experiment results are shown in fig. 4 and fig. 5, a horizontal coordinate in fig. 4 represents a distance threshold value between a central point of a target position estimated by an algorithm and a target central point marked manually, and a vertical axis represents a ratio of frame numbers smaller than the threshold value to total frame numbers, namely prediction Precision; in fig. 5, the abscissa represents the coincidence threshold between the area of the target bounding box estimated by the algorithm and the bounding box of the artificially labeled target, and the ordinate represents the percentage of the total frames occupied by the frames smaller than the threshold, i.e., the power.
As can be seen by combining FIG. 4 and FIG. 5, the MDPR of the present invention shows better tracking accuracy and success rate on the OTB100 data set than the above comparison algorithm, the comprehensive accuracy is 79.3%, and the success rate is 75.1%.
In conclusion, in complex scenes such as illumination change, rotation change, size change and the like, the characterization capabilities of different layer features are measured by calculating the reliability of a channel, and then the positioning information of the different layer features is fused to obtain more accurate target position information; in the tracking process, the target size is updated in real time by adopting a scale pool technology, and the features extracted from the previous frame are fused with the original template features to obtain a tracking model with stronger robustness.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.

Claims (7)

1. A multilayer depth feature target tracking method based on reliability is characterized by comprising the following steps:
step S1, inputting picture I according to the first frame of the video sequence1And target frame, and subsequent input picture It,t∈[2,N]N is the total frame number of the video sequence, (x, y) is the center point coordinate, and c is (c)w,ch) Is a target scale;
step S2, extracting target features of the current frame by adopting the trained VGG-Net-19 network, and extracting three layers of feature description targets of conv3-4, conv4-4 and conv5-4 according to network characteristics, wherein the three layers of feature description targets are marked as fd,d=1,2,3;
Step S3, fdD 1,2,3 as input characteristics to the correlation filter, to obtain a corresponding response map Rd,d=1,2,3;
Step S4, solving response graph R respectivelydD is 1,2, 3: the peak value, the peak sidelobe ratio, the average peak correlation energy, the secondary main peak and the main peak ratio of the response diagram are respectively marked as rmax(Rd)、rPSR(Rd)、rAPCE(Rd)、rRSFMP(Rd),d=1,2,3;
Step S5, obtaining the reliability of different input characteristics and normalizing the reliability, wherein the reliability is kd,d=1,2,3;
S6, fusing positioning information of the three input characteristics according to reliability weighting to obtain a final response graph R and obtain target positioning information;
step S7, estimating and updating the target scale by using the scale pool technique, and obtaining the latest target scale c ═ (c)w,ch);
Step S8, updating the target model;
and step S9, repeating the steps S2-S8 until all the frames of the current sequence are tracked to the end.
2. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the response map R is obtained in step S3dThe solving process of d ═ 1,2 and 3 is divided into two parts of training and detection, and the concrete steps are as follows:
step S31, fdD ═ 1,2,3 as the input characteristic of the correlation filter, by arg min ∑ wd*fd-y||2+λ||wd||2Training filter parameters, wherein is a correlation operation, wdTaking the parameters of a relevant filter, y is a standard two-dimensional Gaussian distribution label, and lambda is a regularization coefficient and is taken as 0.0001; the filter parameters can be obtained as
Figure FDA0002996075710000021
Wherein, FdIs fdFourier transform, (F)d)HIs FdY is the Fourier transform of Y, YHIs the conjugate of Y;
step S32, solving for f through inverse Fourier transformdThe response diagram is
Figure FDA0002996075710000022
Wherein, F-1For inverse fourier transformation, z is a feature representation of the candidate region,
Figure FDA0002996075710000023
is a dot product operation.
3. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the solving of the four indexes of the response map in the step S4 includes:
step S41, response diagram peak value rmax(Rd) In response to the maximum indicator of the map, rmax(Rd)=max(Rd) The larger the peak value is, the stronger the resolution of the main peak is;
Step S42, Peak to sidelobe ratio rPSR(Rd) The calculation formula of (2) is as follows:
Figure FDA0002996075710000024
wherein, mu (R)d) Is a response graph RdMean value of (a), (b), (c), (dd) Is a response graph RdStandard deviation of (a), rPSR(Rd) The larger the response map is, the more reliable the response map is;
step S43, average peak correlation energy rAPCE(Rd) The average peak correlation energy index of the response map is used for reflecting the average fluctuation degree of the response map and the confidence level of the detected target, and the calculation formula is
Figure 1
Wherein, min (R)d) In response to the minimum value, Rd(i, j) is a response value with the coordinate position (i, j) in the response map;
step S44, minor major peak and major peak ratio rRSFMP(Rd) Is an index of the ratio of the secondary main peak to the main peak of the response diagram and is used for measuring the prominence of the main mode of the response diagram, rRSFMP(Rd) The larger the peak protrusion, the better the main peak protrusion; the calculation formula is rRSFMP(Rd)=1-min(rpeak2(Rd)/rpeak1(Rd) 0.5), wherein rpeak1(Rd) In response to the peak of the first main peak in the plot, rpeak1(Rd)=rmax(Rd);rpeak2(Rd) The first and second main peaks are not contiguous in response to the peak of the second main peak in the graph.
4. The reliability-based multi-layer depth feature target tracking method according to claim 1, wherein the step S5 of obtaining the reliability of different input features and normalizing comprises the following specific steps:
step S51, calculating the reliability, k, of each channeld'=rmax(Rd)·rPSR(Rd)·rAPCE(Rd)·rRSFMP(Rd),d=1,2,3;
Step S52, the reliability of each channel obtained by calculation is normalized, and the normalization formula is
Figure FDA0002996075710000031
5. The method for tracking the multilayer depth feature target based on the reliability as claimed in claim 1, wherein the fused response map in step S6
Figure FDA0002996075710000032
And obtaining the final positioning information as the peak value of the final confidence map.
6. The method for tracking the target according to claim 1, wherein in S7, a scale pool technique is used to estimate the target scale, and the scale pool is set to S ═ { S ═ S1,s2,...,skI.e. for these scales s during trackingic|siE.g. S, and selecting the maximum reliability coefficient as the size of the current frame, wherein the specific calculation steps are as follows:
in step S71, the scale pool is set to S ═ {0.8,0.85,0.9,0.95,1.0,1.05,1.1,1.15,1.2}, and these dimensions are estimated separately by
Figure FDA0002996075710000033
Selecting the size with the best performance;
step S72, to ensure the target size is stable, the size is updated smoothly, ct+1=(1-γ)ct+γct+1And gamma is the target size learning rate, and is taken as 0.2.
7. The method for tracking the target according to the claim 1, wherein the step S8 is to model the targetThe row updating is to divide the relevant filter into a denominator part and a numerator part for updating respectively,
Figure FDA0002996075710000034
wherein,
Figure FDA0002996075710000035
is a molecular part of the molecular material,
Figure FDA0002996075710000036
is a part of the denominator, and the specific updating steps are as follows:
step S81, updating the molecule part,
Figure FDA0002996075710000037
wherein, beta is the learning rate of the correlation filter model, and the value is 0.01;
step S82, updating the denominator,
Figure FDA0002996075710000041
CN202110330225.3A 2021-03-29 2021-03-29 Multilayer depth feature target tracking method based on reliability Pending CN113077490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110330225.3A CN113077490A (en) 2021-03-29 2021-03-29 Multilayer depth feature target tracking method based on reliability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110330225.3A CN113077490A (en) 2021-03-29 2021-03-29 Multilayer depth feature target tracking method based on reliability

Publications (1)

Publication Number Publication Date
CN113077490A true CN113077490A (en) 2021-07-06

Family

ID=76611161

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110330225.3A Pending CN113077490A (en) 2021-03-29 2021-03-29 Multilayer depth feature target tracking method based on reliability

Country Status (1)

Country Link
CN (1) CN113077490A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359330A (en) * 2021-11-01 2022-04-15 中国人民解放军陆军工程大学 Long-term target tracking method and system fusing depth information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943837A (en) * 2017-10-27 2018-04-20 江苏理工学院 A kind of video abstraction generating method of foreground target key frame
CN109410247A (en) * 2018-10-16 2019-03-01 中国石油大学(华东) A kind of video tracking algorithm of multi-template and adaptive features select
CN111311647A (en) * 2020-01-17 2020-06-19 长沙理工大学 Target tracking method and device based on global-local and Kalman filtering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943837A (en) * 2017-10-27 2018-04-20 江苏理工学院 A kind of video abstraction generating method of foreground target key frame
CN109410247A (en) * 2018-10-16 2019-03-01 中国石油大学(华东) A kind of video tracking algorithm of multi-template and adaptive features select
CN111311647A (en) * 2020-01-17 2020-06-19 长沙理工大学 Target tracking method and device based on global-local and Kalman filtering

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
小小菜鸟一只: "目标跟踪算法——HCF:Hierarchical Convolutional Features for Visual Tracking", pages 1 - 6, Retrieved from the Internet <URL:https://blog.csdn.net/crazyice521/article/details/65935753> *
尹明锋 等: "基于加权时空上下文学习的多特征视觉跟踪", 《中国惯性技术学报》, vol. 27, no. 1, pages 43 - 50 *
尹明锋 等: "基于改进的空间直方图相似性度量的粒子滤波视觉跟踪(英文)", 《中国惯性技术学报》, vol. 26, no. 3, pages 359 - 365 *
尹明锋 等: "基于通道可靠性的多尺度背景感知相关滤波跟踪算法", 《光学学报》, vol. 39, no. 5, pages 1 - 11 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359330A (en) * 2021-11-01 2022-04-15 中国人民解放军陆军工程大学 Long-term target tracking method and system fusing depth information

Similar Documents

Publication Publication Date Title
CN110363122B (en) Cross-domain target detection method based on multi-layer feature alignment
CN108921873B (en) Markov decision-making online multi-target tracking method based on kernel correlation filtering optimization
CN104200495B (en) A kind of multi-object tracking method in video monitoring
CN110490913B (en) Image matching method based on feature description operator of corner and single line segment grouping
CN107633226B (en) Human body motion tracking feature processing method
CN111553425B (en) Template matching LSP algorithm, medium and equipment for visual positioning
CN111582349B (en) Improved target tracking algorithm based on YOLOv3 and kernel correlation filtering
CN108197604A (en) Fast face positioning and tracing method based on embedded device
CN111612817A (en) Target tracking method based on depth feature adaptive fusion and context information
US9129152B2 (en) Exemplar-based feature weighting
CA3136674C (en) Methods and systems for crack detection using a fully convolutional network
CN111429485B (en) Cross-modal filtering tracking method based on self-adaptive regularization and high-reliability updating
CN105279772A (en) Trackability distinguishing method of infrared sequence image
CN111640138A (en) Target tracking method, device, equipment and storage medium
CN108446613A (en) A kind of pedestrian&#39;s recognition methods again based on distance centerization and projection vector study
CN112613565B (en) Anti-occlusion tracking method based on multi-feature fusion and adaptive learning rate updating
CN106557740A (en) The recognition methods of oil depot target in a kind of remote sensing images
CN105894037A (en) Whole supervision and classification method of remote sensing images extracted based on SIFT training samples
CN117437406A (en) Multi-target detection method and device
CN109766748B (en) Pedestrian re-recognition method based on projection transformation and dictionary learning
CN115953371A (en) Insulator defect detection method, device, equipment and storage medium
CN106447662A (en) Combined distance based FCM image segmentation algorithm
CN108549905A (en) A kind of accurate method for tracking target under serious circumstance of occlusion
CN109508674B (en) Airborne downward-looking heterogeneous image matching method based on region division
CN110827327B (en) Fusion-based long-term target tracking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210706