CN114882078A

CN114882078A - Visual tracking method based on position prediction

Info

Publication number: CN114882078A
Application number: CN202210509658.XA
Authority: CN
Inventors: 何士举; 张伟林; 陈桥
Original assignee: Hefei Zhongke Shengu Technology Development Co ltd
Current assignee: Hefei Zhongke Shengu Technology Development Co ltd
Priority date: 2022-05-11
Filing date: 2022-05-11
Publication date: 2022-08-09

Abstract

The invention discloses a visual tracking method based on position prediction, which mainly comprises the following steps: s1, acquiring image frame data of the surrounding environment by a visual sensor carried on the mobile robot; s2, according to the initial target position

Initializing an FDSST tracker, and simultaneously initializing parameters P (0), R (0) and Q (0) of Kalman filtering, and tracking the target by an FDSST algorithm. The invention firstly utilizes the corresponding response graph generated by the image block and passes through the formula

Whether the target object is shielded in the mobile robot vision can be qualitatively judged, and the tracking result evaluation standard introduced on the basis of the traditional algorithm is the basis for predicting the target position at the next moment based on the Kalman algorithm.

Description

Visual tracking method based on position prediction

Technical Field

The invention relates to the technical field of mobile robot vision following, in particular to a vision tracking method based on position prediction.

Background

With the development of science and technology, mobile robots are widely applied to various fields such as smart homes, industrial production, social services and the like, however, the interaction between the mobile robots and the surrounding environment cannot be separated in various fields, and the fact that the mobile robots can obtain information such as the shape, the position, the speed and the like of a target object in real time is the basis for human-computer interaction. The vision-based target tracking method utilizes a vision sensor to obtain an image frame sequence of the surrounding environment of the mobile robot, and then carries out analysis processing on the image frame sequence to obtain the relevant information of a target object.

Although the related filtering target tracking algorithm has a good tracking effect on the change of illumination and the deformation of a target, the related filtering target tracking algorithm still has a defect when the target is shielded, and the main reason is that the traditional algorithm updates the target template on the tracking result of each frame at a fixed learning rate and does not evaluate the tracking result of each frame of image, and when an object in a field of view is shielded, the target template is easily polluted, so that the final tracking fails.

Disclosure of Invention

The invention aims to provide a visual tracking method based on position prediction, which aims to solve the problems that a traditional algorithm is provided in the background technology, a target template is updated on a tracking result of each frame at a fixed learning rate, the tracking result of an image of each frame is not evaluated, and when an object in a visual field is blocked, the target template is easily polluted, so that the final tracking fails.

In order to achieve the purpose, the invention provides the following technical scheme: a visual tracking method based on position prediction mainly comprises the following steps:

s1, acquiring image frame data of the surrounding environment by a visual sensor carried on the mobile robot;

s2, initializing an FDSST tracker according to the initial target position [ x (0), y (0), w (0), h (0) ] and simultaneously initializing parameters P (0), R (0) and Q (0) of Kalman filtering, and tracking the target by an FDSST algorithm;

s3, according to the formula

Calculating a response map in a subsequent frame:

evaluating the standard according to the tracking result, and using the formula

Judging the tracking result of each frame, and if the tracking result is not credible and indicates that environmental interference exists, executing step S5; if the tracking result is credible, the environmental interference is ended, and the position with the maximum correlation response in the formula is output as the position of the target in a new frame [ x (t), y (t), w (t), h (t)]And performs step S6;

s5, estimating the position of the target in a new frame by using a Kalman filtering prediction algorithm, outputting tracking results [ x (t), y (t), w (t), h (t) ], and returning to the step S3;

and S6, updating the translation filter and the scale filter model in the FDSST tracking algorithm according to a formula, updating the Kalman filter according to a tracking result, and returning to the step S3.

Preferably, in step S2, the FDSST algorithm is specifically described as follows:

s201, FDSST mainly comprises a translation filter and a scale correlation filter, wherein the translation filter is responsible for finding the position of a target object in the next image frame, and the translation filter is used for detecting the size change of the target object;

s202, training samples f from a large number of image blocks by the target of the translation filter _j The optimal correlation filter h is obtained through middle learning and is obtained through the following functions:

wherein gamma represents the coefficient of the regular term, g represents the expected correlation output with the training sample f, i ∈ {1, 2 ⁱ Comprises the following steps:

wherein capital letters are Fourier transform of corresponding small letters, and in engineering practice, in order to reduce calculated amount and obtain robust near vision value, H is used ^l Splitting into a numerator and a denominator for iterative updating:

wherein mu is the learning rate, and t is the corresponding image frame number;

the FDSST tracking algorithm performs PCA dimension reduction on the extracted features in order to further reduce the calculated amount, and then the extracted image block training sample f is updated as follows:

α _t ＝(1-u)α _t-1 +uf _t (2.2.4)

FDSST algorithm utilizes projection matrix P _t Projecting the original HOG features into a low-dimensional feature subspace, wherein a projection matrix P _t Has the dimension of

For the compressed feature dimensions, the algorithm reconstructs the training samples α by minimizing _t Error of (2) to obtain a projection matrix P _t ：

The updates of a and B are as follows:

wherein

For Fourier transform operation,

Image block for the next frame of a given video image sequence

The relative response score is calculated by the following formula, and the position of the next frame target is the position with the maximum score;

translation correlation filter h in FDSST algorithm _trans Extracting HOG characteristics from a cyclic shift training sample, then updating a filter by using a formula (2.2.6), firstly extracting a characteristic image block z of a cranial target region in the next frame of an image sequence, and then positioning the position of a target with the maximum related response score by using a formula (2.2.7);

s203, the scale correlation filter of the FDSST algorithm is realized by learning an independent 1-dimensional scale correlation filter, based on the assumption that the change of the scale between two frames is smaller than the change of the position, after the translation correlation filter positions the position of a target in a new frame, the scale correlation filter extracts image blocks with different scales by taking the current target position as the center, and because the characteristic dimensions of the image blocks in the learning and detection formula of the translation correlation filter are arbitrary, the learning and detection of the scale correlation filter in the FDSST algorithm use the same mode.

Preferably, in step S4, the tracking result evaluation criterion specifically includes the following:

s401, according to the response graph of the image frame obtained in the step S3, whether the target object is occluded or not can be qualitatively judged by judging whether the response graph has multiple peaks, if the current response graph has a multiple peak state and is violently vibrated, the tracking result based on the FDSST visual tracking algorithm is likely to be wrong at the moment, and thus the tracking is failed;

s402, according to the data of the relevant response diagram and the introduced tracking result evaluation standard, the specific formula is as follows:

wherein y is _max Indicating the magnitude of the peak in the correlation response plot, E _y The definition is as follows:

E _y ＝∑ _w，h |y _w，h | ² (4.2.2)

through the formula (4.2.1), whether obstacles appear in the sight line of the vision sensor of the mobile robot can be qualitatively judged, if the T value is close to 1, the target object is judged not to be shielded, and if the T value is small, the target object at the moment is considered to be shielded.

Preferably, in step S5, the specific step of predicting the position of the target object based on the kalman filter algorithm is as follows:

s501, according to the judgment result in the step S4, if the value of the tracking result evaluation standard value T is less than 0.25, the target object is judged to be shielded at the moment, and the tracking of the FDSST visual tracking algorithm is invalid at the moment, and the Kalman filtering algorithm is adopted to estimate the position of the target object at the next moment;

s502, Kalman filtering is to establish a model for the current motion state of a target so as to predict the motion state of the target at the next moment in the historical state observed quantity of the given target;

at time t, the state information in the image block includes: the method comprises the following steps of (1) coordinates of a centroid (x, y) of a target, a speed v _ x of a moving target in an x-axis direction, a speed v _ y of the target in a direction, a width w of a target frame and a height h of the target frame, wherein because a tracking algorithm can only identify the position and the size of the moving target in a video image frame, observed quantities are the coordinates of the centroid (z _ x, z _ y) of the identified tracked target, the width w and the height h of the target frame, and the observed quantities are a simplified model to ensure the real-time performance of the tracking algorithm, wherein the target is assumed to move linearly, and meanwhile, the motion of the target is modeled by using the information of the above target:

the state and observed quantity of the target at time t are respectively expressed as:

X(t)＝[x(t)，y(t)，v_x(t)，v_y(t)，w(t)，h(t)] (5.3.1)

Z(t)＝[z_x(t)，z_y(t)，z_w(t)，z_h(t)] (5.3.2)

then at time t +1, the state and observed state of the system are derived from the state space equation:

X(t+1|t)＝A(t+1|t)X(t)+w(t+1) (5.3.3)

Z(t+1|t)＝HX(t+1|t)+v(t+1) (5.3.4)

the state transition matrix A (t +1| t) and the system measurement matrix can be derived from Newton's second law as follows H:

wherein d is _t Representing the time interval between image frames, w (t +1) and v (t +1) are assumed to be white gaussian noise with a mean value of zero.

Compared with the prior art, the invention has the beneficial effects that:

the invention firstly utilizes the corresponding response graph generated by the image block and passes through the formula

Can qualitatively judge whether the target object is covered in the vision of the mobile robotUnder the condition of the gear, the tracking result evaluation standard is introduced on the basis of the traditional algorithm and is the basis for predicting the target position at the next moment based on the Kalman algorithm, and then the position information of the target object at the next moment is predicted by utilizing the prediction and update steps of the Kalman filtering algorithm and based on the Newton's mechanical formula.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a visual tracking method based on location prediction includes the following steps:

(1) a visual sensor carried on the mobile robot acquires image frame data of the surrounding environment;

(2) initializing an FDSST tracker according to an initial target position [ x (0), y (0), w (0), h (0) ] and simultaneously initializing parameters P (0), R (0), Q (0) of Kalman filtering, and tracking the target by an FDSST algorithm;

(3) calculating a response map in a subsequent frame according to formula (2.2.7);

(4) judging the tracking result of each frame by using a formula (4.2.1) according to the tracking result evaluation standard, and executing the step (5) if the tracking result is not credible and indicates that environmental interference exists; if the tracking result is credible, the environmental interference is ended, outputting the position with the maximum correlation response in the formula as the position [ x (t), y (t), w (t), h (t) ] of the target in a new frame, and executing the step (6);

(5) estimating the position of the target in a new frame by using a Kalman filtering prediction algorithm, outputting tracking results [ x (t), y (t), w (t), h (t) ], and returning to the step (3);

(6) and (4) updating the translation filter and the scale filter model in the FDSST tracking algorithm according to a formula, updating the Kalman filter according to a tracking result, and returning to the step (3).

In the step (2), the FDSST algorithm is specifically described as follows:

(2.1) FDSST is mainly composed of a translation filter and a scale-dependent filter, the translation filter being responsible for finding the position of the target object in the next image frame, the latter being the detection of the change in the size of the target object.

(2.2) the goal of the translation filter is to train samples f from a large number of image patches _j The optimal correlation filter h is obtained through middle learning, and is obtained through the following functions:

where γ represents the coefficient of the regularization term, g represents the expected correlation output with the training sample f, and i ∈ {1, 2. The optimal translation correlation filter H obtained according to the formula ⁱ Comprises the following steps:

where capital letters are the fourier transforms of their corresponding lower case letters. Meanwhile, in engineering practice, in order to reduce the calculation amount and obtain a robust near vision value, H is used ^l Splitting into a numerator and a denominator for iterative updating:

where μ is the learning rate and t is the corresponding number of image frames.

α _t ＝(1-u)α _t-1 +uf _t (2.2.4)

the FDSST algorithm projects the original HOG features into a low-dimensional feature subspace using a projection matrix Pt, where the projection matrix P _t Has the dimension of

Is the feature dimension after compression. Algorithm reconstructs training samples alpha by minimizing _t Error of (2) to obtain a projection matrix P _t ：

The updates of a and B are as follows:

wherein

Is a fourier transform operation.

Image block for the next frame of a given video image sequence

The relative response score is calculated by the following formula, and the position of the next frame target is the position with the maximum score.

Translation correlation filter h in FDSST algorithm _trans By extracting the HOG feature for the cyclic shift training sample and then updating the filter using equation (2.2.6), the feature image block z of the predicted target region is first extracted in the next frame of the image sequence, and then the position of the target where the correlation response score is maximum is located using equation (2.2.7).

(2.3) the scale correlation filter of the FDSST algorithm is realized by learning an independent 1-dimensional scale correlation filter, and based on the assumption that the change of the scale between two frames is smaller than the change of the position, after the translation correlation filter positions the position of a target in a new frame, the scale correlation filter extracts image blocks with different scales by taking the current target position as the center. Since the feature dimensions of the image block in the learning and detection formula of the translation correlation filter are arbitrary, the same manner is used for the learning and detection of the scale correlation filter in the FDSST algorithm.

In the step (4), the evaluation criterion of the tracking result specifically includes the following:

(4.1) according to the response graph of the image frame obtained in the step (3), whether the target object is blocked can be qualitatively judged by judging whether the response graph has multiple peaks, and if the current response graph has a multiple peak state and is violently shocked, the tracking result based on the FDSST visual tracking algorithm is likely to be wrong at the moment, so that the tracking is failed.

(4.2) according to the data of the relevant response diagram and the introduced tracking result evaluation standard, the specific formula is as follows:

E _y ＝∑ _w，h |y _w，h | ² (4.2.2)

In the step (5), the specific steps of predicting the position of the target object based on the kalman filter algorithm are as follows:

and (5.1) according to the judgment result in the step (4), if the value of the tracking result evaluation standard value T is less than 0.25, judging that the target object is shielded at the moment, and if the tracking of the FDSST visual tracking algorithm is invalid at the moment, adopting a Kalman filtering algorithm to estimate the position of the target object at the next moment.

(5.2) Kalman filtering was proposed in 1960, which provides an efficient recursive computation method to estimate the state of the process in a way that minimizes the estimation error. For state estimation of a target object, kalman filtering is to model the current motion state of the target to predict the motion state of the target at the next moment given the historical state observations of the target.

(5.3) at time t, the state information in the image block includes: the coordinates of the center of mass of the target (x, y), the speed of the moving target in the direction of the x-axis v _ x, the speed of the target in the direction v _ y, the width w of the target frame, and the height h of the target frame. Since the tracking algorithm can only identify the position and size of a moving object in a video image frame, the observations are the coordinates of the centroid (z _ x, z _ y) of the identified tracked object, the width w and the height h of the object frame. To simplify the model to ensure real-time performance of the tracking algorithm, linear motion of the target is assumed here, while the motion of the target is modeled using the above information of the target:

X(t)＝[x(t)，y(t)，v_x(t)，v_y(t)，w(t)，h(t)] (5.3.1)

Z(t)＝[z_x(t)，z_y(t)，z_w(t)，z_h(t)] (5.3.2)

X(t+1|t)＝A(t+1|t)X(t)+w(t+1) (5.3.3)

Z(t+1|t)＝HX(t+1|t)+v(t+1) (5.3.4)

In conclusion, the target position tracking of the target object when the target object is shielded by the surrounding environment object is realized by utilizing the tracking result prediction standard and the position prediction based on the Kalman filtering algorithm, and the target position tracking under the extreme environment can be effectively solved by introducing the position prediction algorithm on the basis of the traditional visual tracking algorithm, so that the tracking of the target object can be effectively prevented from being interfered by the outside.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A visual tracking method based on location prediction, characterized by: the method mainly comprises the following steps:

s3, according to the formula

Calculating a response map in a subsequent frame;

s4, evaluating the standard according to the tracking result and using the formula

Judging the tracking result of each frame, and if the tracking result is not credible and indicates that environmental interference exists, executing a step S5; if the tracking result is credible, the environmental interference is ended, and the position with the maximum correlation response in the formula is output as the position of the target in a new frame [ x (t), y (t), w (t), h (t)]And performs step S6;

2. A visual tracking method based on location prediction according to claim 1, characterized in that: in step S2, the FDSST algorithm is specifically described as follows:

s202, translation filteringThe goal of the machine is to train samples f from a large number of image blocks _j The optimal correlation filter h is obtained through middle learning and is obtained through the following functions:

wherein mu is the learning rate, and t is the corresponding image frame number;

α _t ＝(1-u)α _t-1 +uf _t (2.2.4)

The updates of a and B are as follows:

wherein

For Fourier transform operation,

Image blocks for the next frame of a given video image sequence

translation correlation filter h in FDSST algorithm _trans Extracting HOG characteristics from a cyclic shift training sample, then updating a filter by using a formula (2.2.6), firstly extracting a characteristic image block z of a predicted target region in the next frame of an image sequence, and then positioning the position of a target with the maximum related response score by using a formula (2.2.7);

3. A visual tracking method based on location prediction according to claim 1, characterized in that: in step S4, the tracking result evaluation criterion specifically includes the following:

E _y ＝∑ _w,h |y _w,h | ² (4.2.2)

4. A visual tracking method based on location prediction according to claim 1, characterized in that: in step S5, the specific steps of predicting the position of the target object based on the kalman filter algorithm are as follows:

X(t)＝[x(t),y(t),v_x(t),v_y(t),w(t),h(t)] (5.3.1)

Z(t)＝[z_x(t),z_y(t),z_w(t),z_h(t)] (5.3.2)

X(t+1|t)＝A(t+1|t)X(t)+w(t+1) (5.3.3)

Z(t+1|t)＝HX(t+1|t)+v(t+1) (5.3.4)