CN110276784B

CN110276784B - Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics

Info

Publication number: CN110276784B
Application number: CN201910478278.2A
Authority: CN
Inventors: 宋勇; 王姗姗; 杨昕; 赵宇飞; 王枫宁; 郭拯坤
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2021-04-06
Anticipated expiration: 2039-06-03
Also published as: CN110276784A

Abstract

The invention provides a related filtering moving target tracking method based on a memory mechanism and convolution characteristics, and belongs to the technical field of computer vision. The method utilizes a pre-trained deep convolutional neural network to extract the convolutional characteristic of a target, is inspired by a human brain memory mechanism in human visual information processing cognitive behaviors, and integrates the memory mechanism into the detection, training and updating processes of a classifier of a relevant filtering method. The memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficient. The method has stronger robustness, and can still continuously and stably realize target tracking under the conditions of violent deformation, reappearance or shielding after temporary disappearance and the like of the target. Meanwhile, the method has higher target tracking speed, reduces the complexity and reduces the operation amount.

Description

Correlation filtering moving target tracking method based on memory mechanism and convolution features

技术领域technical field

本发明涉及一种图像序列中运动目标的跟踪方法，具体涉及一种基于记忆机制与卷积特征的相关滤波运动目标跟踪方法，属于计算机视觉技术领域。The invention relates to a tracking method for moving objects in an image sequence, in particular to a related filtering moving object tracking method based on a memory mechanism and convolution features, and belongs to the technical field of computer vision.

背景技术Background technique

运动目标跟踪技术是计算机视觉科学的重要研究方向，在安全监控、人机接口、医疗诊断等领域应用广泛。目前，运动目标跟踪技术存在的主要问题是难以克服背景光照条件变化、目标发生遮挡、形状变化、尺寸变化和快速运动等复杂干扰因素的影响，导致跟踪精度下降。Moving target tracking technology is an important research direction of computer vision science, which is widely used in security monitoring, human-machine interface, medical diagnosis and other fields. At present, the main problem of moving target tracking technology is that it is difficult to overcome the influence of complex interference factors such as changes in background lighting conditions, target occlusion, shape changes, size changes, and rapid motion, resulting in a decrease in tracking accuracy.

判别式跟踪方法是一种重要的运动目标跟踪方法，具体包括：多样本学习(MultipleInstance Learning，MIL)跟踪方法、跟踪-学习-检测(Tracking-Learning-Detection，TLD)跟踪方法、基于核结构化输出(Structured output tracking withkernel，Struck)跟踪方法等。此类方法的原理是：首先，将目标作为正样本，将背景作为负样本，训练出分类器；然后，利用此分类器对搜索区域进行检测，将响应度最大的点视为目标中心位置，从而进行跟踪。通常，此类方法通过稀疏取样方式来训练分类器，即，在目标附近取若干个等尺寸的窗口作为样本。然而，当样本数量增大时，计算量也随之增加，从而降低了跟踪方法的实时性。The discriminative tracking method is an important moving target tracking method, which includes: Multiple Instance Learning (MIL) tracking method, Tracking-Learning-Detection (TLD) tracking method, kernel-based structure Output (Structured output tracking withkernel, Struck) tracking method, etc. The principle of this kind of method is: first, take the target as a positive sample and the background as a negative sample, and train a classifier; then, use this classifier to detect the search area, and regard the point with the largest responsiveness as the center of the target, to track. Typically, such methods train the classifier by sparse sampling, that is, taking several equal-sized windows around the target as samples. However, when the number of samples increases, the amount of computation increases, which reduces the real-time performance of the tracking method.

相关滤波跟踪方法，通过构造样本的循环矩阵，一定程度上解决了判别式跟踪方法训练样本不足和计算量大的问题。例如，Henriques等提出的KCF算法(Henriques J F,Rui C,Martins P,et al.“High-Speed Tracking with Kernelized CorrelationFilters”.IEEE Transactions on Pattern Analysis&Machine Intelligence,2014,37(3):583-596)，根据循环矩阵在经过傅里叶变换后成为对角矩阵的特点，将单个样本进行移位循环后，可在傅里叶域进行分类器的快速检测与训练。并且通过基于核的脊回归运算，实现相关滤波的过程。该算法不仅具有较高的实时性，而且实现了非线性条件下运动目标的精确跟踪。The correlation filtering tracking method, by constructing the cyclic matrix of the samples, solves the problems of insufficient training samples and large amount of computation of the discriminative tracking method to a certain extent. For example, the KCF algorithm proposed by Henriques et al. (Henriques J F, Rui C, Martins P, et al. "High-Speed Tracking with Kernelized Correlation Filters". IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(3): 583-596), According to the characteristic that the circulant matrix becomes a diagonal matrix after Fourier transform, after shifting and circulating a single sample, the classifier can be quickly detected and trained in the Fourier domain. And through the kernel-based ridge regression operation, the process of correlation filtering is realized. The algorithm not only has high real-time performance, but also achieves accurate tracking of moving targets under nonlinear conditions.

近年来，深度学习领域的研究成果开始与相关滤波跟踪方法结合起来。例如，HCF算法(Ma C,Huang J B,Yang X,et al.Hierarchical Convolutional Features forVisual Tracking[C]//IEEE International Conference on Computer Vision.IEEEComputer Society,2015:3074-3082.)在KCF算法的框架下，将HOG特征替换为分层的卷积特征。根据高层特征含有更多语义信息，低层特征含有更多纹理、轮廓等局部信息的特点，先用最高层特征确定目标的大致位置，再向下逐渐精确定位，相较传统的手工提取的特征，具有更高的鲁棒性。In recent years, research results in the field of deep learning have begun to be combined with correlation filter tracking methods. For example, HCF algorithm (Ma C, Huang J B, Yang X, et al. Hierarchical Convolutional Features for Visual Tracking [C]//IEEE International Conference on Computer Vision. IEEE Computer Society, 2015: 3074-3082.) under the framework of KCF algorithm , replacing HOG features with hierarchical convolutional features. According to the characteristics that high-level features contain more semantic information, and low-level features contain more local information such as textures and contours, first use the top-level features to determine the approximate location of the target, and then gradually and accurately locate the target downward. Compared with the traditional hand-extracted features, Has higher robustness.

利用卷积特征的相关滤波算法虽具有上述优势，但也存在一定的局限性：一是分类器在检测和训练时共提取了两次卷积特征，计算量非常大；二是每帧以固定速率更新目标模板和分类器，导致适应目标剧烈变化的能力较差。因此，当目标出现形状突变、严重遮挡以及短暂消失后重现等情况时，其跟踪精度会明显下降，甚至丢失目标；并且难以满足实时性要求。Although the correlation filtering algorithm using convolution features has the above advantages, it also has certain limitations: First, the classifier extracts two convolution features during detection and training, which is very computationally expensive; Rate update target templates and classifiers, resulting in poor ability to adapt to drastic changes in the target. Therefore, when the target has sudden changes in shape, severe occlusion, and reappears after a short disappearance, its tracking accuracy will be significantly reduced, or even the target will be lost; and it is difficult to meet the real-time requirements.

发明内容SUMMARY OF THE INVENTION

本发明的目的是为了解决在目标发生姿态和形状突变、短暂消失后重现以及遮挡等干扰条件下能够精确、高速跟踪目标的问题，提出一种基于记忆机制与卷积特征的相关滤波运动目标跟踪方法，将人脑的记忆机制融入到相关滤波算法的分类器检测、训练与更新过程，能够实现复杂应用场景下精度高、鲁棒性强、运算速度快的运动目标跟踪。The purpose of the present invention is to solve the problem of accurate and high-speed tracking of the target under the interference conditions such as the sudden change of the posture and shape of the target, the reappearance after a short disappearance, and the occlusion, and proposes a related filtering moving target based on a memory mechanism and convolution features. The tracking method integrates the memory mechanism of the human brain into the classifier detection, training and update process of the relevant filtering algorithm, which can achieve moving target tracking with high accuracy, strong robustness and fast calculation speed in complex application scenarios.

本发明方法，利用预先训练过的深层卷积神经网络提取目标的卷积特征。受人类视觉信息处理认知行为中人脑记忆机制的启发，将记忆机制融入到相关滤波方法的分类器的检测、训练和更新过程之中。其中，记忆机制由响应图决策、自适应峰值检测和自适应融合系数三部分组成。记忆机制与分类器的检测、训练和更新融合的过程描述如下：The method of the present invention utilizes a pre-trained deep convolutional neural network to extract the convolution features of the target. Inspired by the human brain memory mechanism in the cognitive behavior of human visual information processing, the memory mechanism is integrated into the detection, training and update process of the classifier of the correlation filtering method. Among them, the memory mechanism consists of three parts: response map decision, adaptive peak detection and adaptive fusion coefficient. The process of integrating the memory mechanism with the detection, training and updating of the classifier is described as follows:

(1)基于响应图决策的分类器检测：提取出候选区域的卷积特征后，记忆空间内所有的分类器均与其进行卷积运算，得到各自的响应图，选取峰值最大的响应图对目标进行定位。(1) Classifier detection based on response map decision: After extracting the convolution feature of the candidate region, all classifiers in the memory space are convolved with it to obtain their respective response maps, and select the response map with the largest peak value to target the target. to locate.

(2)基于自适应峰值检测的分类器训练：目标定位后，综合响应图中主峰和次高干扰峰的大小关系与位置关系，对目标的变化情况进行分析。如果干扰程度大于阈值，则再次提取目标的卷积特征，训练新的分类器。如果干扰程度不大于阈值，则不进行分类器的训练与更新。(2) Classifier training based on adaptive peak detection: After the target is located, the size relationship and positional relationship between the main peak and the second-highest interference peak in the response graph are synthesized, and the change of the target is analyzed. If the degree of interference is greater than the threshold, the convolutional features of the target are extracted again to train a new classifier. If the degree of interference is not greater than the threshold, the classifier will not be trained and updated.

(3)基于自适应融合系数的分类器更新：训练好新的分类器后，根据峰值检测的结果，自适应地计算融合系数。干扰越剧烈，融合系数越大。(3) Classifier update based on adaptive fusion coefficient: After training a new classifier, the fusion coefficient is adaptively calculated according to the result of peak detection. The more severe the interference, the larger the fusion coefficient.

通过以上方式，实现记忆机制与跟踪方法的有机融合。Through the above methods, the organic integration of memory mechanism and tracking method is realized.

本发明方法的具体实现过程如下：The concrete realization process of the method of the present invention is as follows:

一种基于记忆机制与卷积特征的相关滤波运动目标跟踪方法，包括以下步骤：A related filtering moving target tracking method based on memory mechanism and convolution features, comprising the following steps:

步骤1：初始化记忆空间。Step 1: Initialize the memory space.

设记忆空间的容量为m，在第1到第m帧时，先将记忆空间填满，暂不执行记忆机制。在第i帧完成分类器的训练后，将该分类器的参数存储到记忆空间中，作为记忆空间内的第i个分类器w[i]，i∈{1,...,m}。此时除了对记忆空间进行初始化外，此时本发明方法与一般的相关滤波跟踪方法的步骤完全相同。当记忆空间被填满后，开始在后续帧中执行记忆机制。Let the capacity of the memory space be m, in the 1st to mth frames, the memory space is filled first, and the memory mechanism is not executed for the time being. After completing the training of the classifier in the ith frame, the parameters of the classifier are stored in the memory space as the ith classifier w[i], i∈{1,...,m} in the memory space. Except for initializing the memory space, the steps of the method of the present invention are exactly the same as those of the general correlation filter tracking method. When the memory space is filled, the memory mechanism starts to execute in subsequent frames.

步骤2：进行基于响应图决策的分类器检测。Step 2: Perform classifier detection based on response graph decision.

步骤2.1：提取当前帧候选区域的卷积特征。Step 2.1: Extract the convolutional features of the candidate region of the current frame.

读取第t帧图像，t>m，根据上一帧确定的目标中心位置选取候选区域。借助预训练过的卷积神经网络，提取跟踪窗口的卷积特征。将后续区域图像输入到卷积神经网络后，选取19个卷积层中的L层的输出作为卷积特征x_t。t时刻候选区域在第l层的特征表示为x_t[l]，l∈L。Read the t-th frame image, t>m, and select the candidate area according to the target center position determined in the previous frame. With the help of pre-trained convolutional neural network, the convolutional features of the tracking window are extracted. After inputting the subsequent region images into the convolutional neural network, the output of the L layer among the 19 convolutional layers is selected as the convolutional feature x _t . The feature of the candidate region at the lth layer at time t is denoted as _xt [l], l∈L.

提取出卷积特征x_t后，以x_t为生成矩阵构造循环矩阵，从而得到检测样本C(x_t)。After the convolution feature x _t is extracted, a circulant matrix is constructed with x _t as the generator matrix, so as to obtain the detection sample C(x _t ).

步骤2.2：检测记忆空间中所有分类器。Step 2.2: Detect all classifiers in the memory space.

令w_t-1[i,l]表示记忆空间中在第t帧之前学习到的第i个分类器对应第l层特征的参数，i∈{1,...,m}，l∈L。用检测样本C(x_t)与分类器卷积得到响应图，响应图上最大响应值的位置被视为目标位置。Let w _t-1 [i,l] denote the parameters of the i-th classifier learned before the t-th frame in the memory space corresponding to the l-th layer feature, i∈{1,...,m}, l∈L . The response map is obtained by convolving the detection sample C(x _t ) with the classifier, and the position of the maximum response value on the response map is regarded as the target position.

由循环矩阵性质可知，任意矩阵与循环矩阵在时域上的卷积，均可表示为其与循环矩阵的生成矩阵在频域上的点乘。将每层特征的响应f_t[i,l]按照固定权重相加，得到记忆空间中第i个分类器在第t帧时的响应图f_t[i]：From the properties of the circulant matrix, it can be known that the convolution of any matrix and the circulant matrix in the time domain can be expressed as the dot product of the generator matrix of the circulant matrix in the frequency domain. The response f _t [i, l] of each layer feature is added according to the fixed weight, and the response map f _t [i] of the i-th classifier in the memory space at the t-th frame is obtained:

其中，

表示快速反傅里叶变换(IFFT)操作，⊙为点乘运算符，大写字母代表变量的傅里叶变换形式，γ是融合权重。X_t[l]代表第t帧时第l层特征的傅里叶变换形式。in,

Indicates the Inverse Fast Fourier Transform (IFFT) operation, ⊙ is the dot product operator, capital letters represent the Fourier transform form of the variable, and γ is the fusion weight. X _t [l] represents the Fourier transform form of the l-th layer feature at the t-th frame.

令记忆空间中所有的分类器均与循环样本进行卷积运算，得到m张响应图，取响应峰值最大的响应图推测目标位置，并使该响应图对应的分类器进行后续的训练与更新：Make all the classifiers in the memory space perform convolution operation with the cyclic samples to obtain m response maps, take the response map with the largest response peak to infer the target position, and make the classifier corresponding to the response map perform subsequent training and update:

式中，π为最大峰值响应图对应的分类器在记忆空间中的索引。In the formula, π is the index of the classifier corresponding to the maximum peak response map in the memory space.

步骤3：进行基于自适应峰值检测的分类器训练。Step 3: Perform classifier training based on adaptive peak detection.

步骤3.1：自适应峰值检测。Step 3.1: Adaptive peak detection.

同时计算与比较响应图上主峰与干扰峰的位置和峰值大小关系，选取主峰之外的次高峰作为干扰峰。当干扰峰距离主峰较远时，即使干扰峰值较高，也认为目标没有受到遮挡；当干扰峰出现在离主峰较近的位置时，即使干扰峰值不高，也判断目标受到遮挡。利用峰值干扰度对目标状态进行判断，公式如下：At the same time, the relationship between the position and peak size of the main peak and the interference peak on the response graph was calculated and compared, and the secondary peak other than the main peak was selected as the interference peak. When the interference peak is far from the main peak, even if the interference peak is high, it is considered that the target is not blocked; when the interference peak appears near the main peak, even if the interference peak is not high, it is judged that the target is blocked. Using the peak interference degree to judge the target state, the formula is as follows:

其中，响应图以主峰中心为原点重新划定了坐标系，H为响应图上主峰的峰值，h为干扰峰峰值，M为主峰在干扰峰方向上到响应图边缘的距离，

是干扰峰相对于主峰的位置向量，

为构造的抛物面。如干扰峰高于此曲面，则认为目标发生了剧烈变化。ρ值为干扰峰超过曲面的距离与整个干扰峰高度的比值，若峰值干扰度ρ＝0，则跳过以下所有步骤，不再进行分类器的训练与更新，直接进入下一帧；当峰值干扰度ρ>0时，执行以下步骤：Among them, the response graph redraws the coordinate system with the center of the main peak as the origin, H is the peak value of the main peak on the response graph, h is the peak value of the interference peak, M is the distance from the main peak in the direction of the interference peak to the edge of the response graph,

is the position vector of the interference peak relative to the main peak,

for the constructed paraboloid. If the interference peak is higher than this surface, the target is considered to have changed drastically. The ρ value is the ratio of the distance between the interference peak exceeding the surface and the height of the entire interference peak. If the peak interference degree ρ=0, skip all the following steps, no longer train and update the classifier, and directly enter the next frame; When the degree of interference ρ>0, perform the following steps:

步骤3.2：提取当前帧目标区域的卷积特征。Step 3.2: Extract the convolutional features of the target region of the current frame.

根据步骤2中当前帧定位的结果，以目标中心为中心，扩展得到与后续区域大小相同的目标区域，将目标区域输入到卷积神经网络中，提取其卷积特征x_t'。According to the result of positioning the current frame in step 2, take the center of the target as the center, expand the target area with the same size as the subsequent area, input the target area into the convolutional neural network, and extract its convolutional feature x _t '.

步骤3.3：分类器训练。Step 3.3: Classifier training.

峰值干扰度ρ>0，表明在步骤3.1中选取的峰值最大响应图对应的分类器与目标的匹配程度较差，需要训练新的分类器w_t'以适应目标的变化。The peak interference degree ρ>0 indicates that the classifier corresponding to the peak maximum response map selected in step 3.1 has a poor matching degree with the target, and a new classifier _wt ' needs to be trained to adapt to the change of the target.

训练分类器的原理与一般的相关滤波方法相同，通过最小化下式训练第l层特征对应的分类器参数w_t'[l]：The principle of training the classifier is the same as the general correlation filtering method. The classifier parameter w _t '[l] corresponding to the feature of the lth layer is trained by minimizing the following formula:

其中，x'_t[l]为训练时在新位置上提取的特征，*为卷积运算符，λ为l2正则化参数；y是训练的目标标签函数，是一个尺寸与分类器大小相同的二维高斯函数，峰值位于中心处。Among them, x' _t [l] is the feature extracted at the new position during training, * is the convolution operator, λ is the l2 regularization parameter; y is the training target label function, which is the same size as the classifier. A 2D Gaussian function with the peak at the center.

该最小化问题的闭式解为：The closed-form solution to this minimization problem is:

其中，

表示目标标签函数的傅里叶变换形式。in,

Represents the Fourier transform form of the target label function.

步骤4：进行基于自适应融合系数的分类器更新。Step 4: Perform classifier update based on adaptive fusion coefficients.

新的分类器参数w_t'训练好后，对记忆空间中的分类器进行更新。分类器w_t-1[π]与w_t'进行加权融合，其余分类器参数不变，公式如下：After the new classifier parameters w _t ' are trained, the classifier in the memory space is updated. The classifier w _t-1 [π] is weighted and fused with w _t ', and the rest of the classifier parameters remain unchanged, and the formula is as follows:

其中，λ为在该分类器在当前帧的融合系数，利用Sigmoid函数自适应求得：Among them, λ is the fusion coefficient of the classifier in the current frame, which is obtained adaptively using the Sigmoid function:

其中，λ关于ρ单调递增，使得目标的变化越剧烈，分类器更新的速率越快；e为自然对数符号。Among them, λ increases monotonically with respect to ρ, so that the more severe the change of the target, the faster the update rate of the classifier; e is the natural logarithmic symbol.

有益效果beneficial effect

本发明方法，对比现有运动目标跟踪方法，具有以下优点：Compared with the existing moving target tracking method, the method of the present invention has the following advantages:

(1)鲁棒性强。本发明方法具有较强的鲁棒性，通过将人脑记忆机制融入到相关滤波算法中，使得算法在跟踪时可以记忆目标出现过的状态。一方面，利用响应图决策从记忆空间中选择最合适的分类器进行检测。另一方面，利用自适应峰值检测进行分类器的训练，仅在目标发生剧烈变化时才重新提取目标的卷积特征重新训练分类器，同时根据峰值检测的结果自适应的计算融合系数，进行分类器的更新，从而在目标发生剧烈形变、短暂消失后重现或遮挡等条件下，仍然能持续稳定地实现目标跟踪。(1) Strong robustness. The method of the invention has strong robustness, and by incorporating the memory mechanism of the human brain into the relevant filtering algorithm, the algorithm can remember the state that the target has appeared in during tracking. On the one hand, the response graph decision is used to select the most suitable classifier from the memory space for detection. On the other hand, the classifier is trained using adaptive peak detection, and the convolution features of the target are re-extracted to retrain the classifier only when the target changes drastically. At the same time, the fusion coefficient is adaptively calculated according to the results of peak detection to classify Therefore, under the conditions of violent deformation of the target, reappearance after a short disappearance or occlusion, etc., the target tracking can still be achieved continuously and stably.

(2)跟踪速度快。本发明方法具有较高的目标跟踪速度。一方面，在相关滤波的框架下，通过循环偏移构建分类器的训练样本。同时，基于循环矩阵的特性把问题变换至频域求解，避免了矩阵求逆过程，从而极大地降低了算法的复杂度。另一方面，将目标在不同状态下的分类器参数储存于记忆空间。当相似状态再次出现时，直接根据响应值选择调用分类器，无需再提取目标区域的CNN特征重新训练，从而减小了接近一半的运算量。(2) The tracking speed is fast. The method of the invention has higher target tracking speed. On the one hand, under the framework of correlation filtering, the training samples of the classifier are constructed by cyclic offset. At the same time, the problem is transformed to the frequency domain solution based on the characteristics of the circulant matrix, which avoids the matrix inversion process, thus greatly reducing the complexity of the algorithm. On the other hand, the classifier parameters of the target in different states are stored in the memory space. When a similar state appears again, the classifier is directly selected and called according to the response value, and there is no need to extract the CNN features of the target area for retraining, thus reducing the amount of computation by nearly half.

附图说明Description of drawings

图1为本发明方法的原理流程图；Fig. 1 is the principle flow chart of the method of the present invention;

图2为本发明方法中基于响应图决策的分类器检测步骤的原理示意图；Fig. 2 is the principle schematic diagram of the classifier detection step based on the response graph decision in the method of the present invention;

图3为本发明方法中基于自适应峰值检测的分类器训练步骤的原理示意图；Fig. 3 is the principle schematic diagram of the classifier training step based on adaptive peak detection in the method of the present invention;

图4为本发明方法中基于自适应融合系数的分类器更新步骤的原理示意图；4 is a schematic diagram of the principle of updating steps of a classifier based on adaptive fusion coefficients in the method of the present invention;

图5为本发明方法的具体流程图；Fig. 5 is the concrete flow chart of the method of the present invention;

图6为本发明方法与常规HCF方法的跟踪结果对比；Fig. 6 is the tracking result comparison of the inventive method and conventional HCF method;

图7为本发明方法与常规HCF方法的跟踪精度曲线；Fig. 7 is the tracking precision curve of the inventive method and conventional HCF method;

图8为本发明方法与常规HCF方法的跟踪指标对比。FIG. 8 is a comparison of the tracking indicators between the method of the present invention and the conventional HCF method.

具体实施方式Detailed ways

下面结合附图与实施例对本发明方法进行具体说明。The method of the present invention will be described in detail below with reference to the accompanying drawings and embodiments.

实施例Example

一种基于记忆机制与卷积特征的相关滤波运动目标跟踪方法，其实现过程如图2所示，包括如下步骤：A related filtering moving target tracking method based on memory mechanism and convolution features, the implementation process is shown in Figure 2, including the following steps:

步骤1：初始化记忆空间。Step 1: Initialize the memory space.

令记忆空间的容量m为4。在第1到4帧时，除了对记忆空间进行初始化外，本发明方法与一般的相关滤波跟踪方法完全相同。在每一帧完成分类器的训练后，将该分类器的参数存储到记忆空间中，作为记忆空间内的第i个分类器。第4帧结束时，记忆空间被填满，在后续帧中开始执行记忆机制。Let the capacity m of the memory space be 4. In frames 1 to 4, except for initializing the memory space, the method of the present invention is exactly the same as the general correlation filter tracking method. After the training of the classifier is completed in each frame, the parameters of the classifier are stored in the memory space as the ith classifier in the memory space. At the end of frame 4, the memory space is filled, and the memory mechanism begins to execute in subsequent frames.

步骤2：基于响应图决策的分类器检测。Step 2: Classifier detection based on response graph decision.

读取第t帧图像，根据上一帧确定的目标中心位置选取候选区域。本发明方法利用已经训练好的VGG-19卷积神经网络提取跟踪窗口的卷积特征。将后续区域图像输入到卷积网络后，选取19个卷积层中的Conv3-4，Conv4-4和Conv5-4的输出作为卷积特征，即L＝{Conv3-4,Conv4-4,Conv5-4}。t时刻时候选区域在第l层的特征表示为x_t[l]，l∈L。Read the t-th frame image, and select the candidate area according to the target center position determined in the previous frame. The method of the invention extracts the convolution feature of the tracking window by using the trained VGG-19 convolutional neural network. After inputting the subsequent region images into the convolutional network, the outputs of Conv3-4, Conv4-4 and Conv5-4 in the 19 convolutional layers are selected as the convolutional features, that is, L={Conv3-4, Conv4-4, Conv5 -4}. The feature of the selected region at the lth layer at time t is represented as _xt [l], l∈L.

步骤2.2：记忆空间中所有分类器的检测。Step 2.2: Detection of all classifiers in memory space.

令w_t-1[i,l]表示记忆空间中在第t帧之前学习到的第i个分类器对应第l层特征的参数，i∈{1,2,3,4}，l∈L。用检测样本C(x_t)与分类器卷积可得到响应图，响应图上最大响应值的位置被视为目标位置。Let w _t-1 [i,l] denote the parameters of the i-th classifier learned before the t-th frame in the memory space corresponding to the l-th layer feature, i∈{1,2,3,4}, l∈L . The response map can be obtained by convolving the detection sample C(x _t ) with the classifier, and the position of the maximum response value on the response map is regarded as the target position.

由循环矩阵的性质可知，任意矩阵与循环矩阵在时域上的卷积，均可表示为其与循环矩阵的生成矩阵在频域上的点乘。将每层特征的响应f_t[i,l]按照固定的权重相加，得到则记忆空间中第i个分类器在第t帧时的响应图f_t[i]：From the properties of the circulant matrix, it can be known that the convolution of any matrix and the circulant matrix in the time domain can be expressed as the dot product of the generator matrix of the circulant matrix in the frequency domain. The response f _t [i, l] of each layer of features is added according to a fixed weight, and the response map f _t [i] of the i-th classifier in the memory space at the t-th frame is obtained:

其中，

表示快速反傅里叶变换(IFFT)操作，⊙为点乘运算符，大写字母代表变量的傅里叶变换形式，γ是融合权重，设为γ＝{0.25,0.5,1}。in,

represents the Inverse Fast Fourier Transform (IFFT) operation, ⊙ is the dot product operator, capital letters represent the Fourier transform form of the variable, and γ is the fusion weight, set to γ={0.25, 0.5, 1}.

令记忆空间中所有的分类器均与循环样本进行卷积运算，得到m张响应图，取响应峰值最大的响应图推测目标位置。并使该响应图对应的分类器进行后续的训练与更新。All the classifiers in the memory space are convolved with the cyclic samples to obtain m response maps, and the response map with the largest response peak is taken to infer the target position. And make the classifier corresponding to the response map perform subsequent training and update.

式中的π为最大峰值响应图对应的分类器在记忆空间中的索引。where π is the index of the classifier corresponding to the maximum peak response graph in the memory space.

步骤3：基于自适应峰值检测的分类器训练。Step 3: Classifier training based on adaptive peak detection.

步骤3.1：自适应峰值检测Step 3.1: Adaptive Peak Detection

自适应峰值检测的核心思想为：同时计算与比较响应图上主峰与干扰峰的位置和峰值大小关系。选取主峰之外的次高峰作为干扰峰。当干扰峰距离主峰较远时，即使干扰峰值较高，也认为目标没有受到遮挡；当干扰峰出现在离主峰较近的位置时，即使干扰峰值不高，也判断目标受到遮挡。利用峰值干扰度对目标状态进行判断，计算公式如下：The core idea of adaptive peak detection is to calculate and compare the relationship between the position and peak size of the main peak and the interference peak on the response graph at the same time. The secondary peaks other than the primary peak were selected as interference peaks. When the interference peak is far from the main peak, even if the interference peak is high, it is considered that the target is not blocked; when the interference peak appears near the main peak, even if the interference peak is not high, it is judged that the target is blocked. The target state is judged by the peak interference degree, and the calculation formula is as follows:

式中，响应图以主峰中心为原点重新划定了坐标系，H为响应图上主峰的峰值，h为干扰峰峰值，M为主峰在干扰峰方向上到响应图边缘的距离，

是干扰峰相对于主峰的位置向量，

为构造的抛物面。如果干扰峰高于此曲面，则认为目标发生了剧烈变化。ρ的值即为干扰峰超过曲面的距离与整个干扰峰高度的比值。若峰值干扰度ρ＝0，则跳过以下步骤，不再进行分类器的训练与更新，直接进入下一帧。In the formula, the response graph re-defines the coordinate system with the center of the main peak as the origin, H is the peak value of the main peak on the response graph, h is the peak value of the interference peak, M is the distance from the main peak in the direction of the interference peak to the edge of the response graph,

is the position vector of the interference peak relative to the main peak,

for the constructed paraboloid. If the interference peak is above this surface, the target is considered to have changed drastically. The value of ρ is the ratio of the distance of the interference peak beyond the surface to the height of the entire interference peak. If the peak interference degree ρ = 0, the following steps are skipped, the classifier training and updating are no longer performed, and the next frame is directly entered.

当峰值干扰度ρ>0时，执行以下步骤。When the peak interference level ρ>0, the following steps are performed.

根据步骤2中当前帧定位的结果，以目标中心为中心，扩展得到与后续区域大小相同的目标区域。将目标区域输入到VGG-19网络中，提取其卷积特征x_t'。According to the positioning result of the current frame in step 2, take the center of the target as the center, and expand the target area with the same size as the subsequent area. The target region is input into the VGG-19 network, and its convolutional features x _t ' are extracted.

步骤3.3：分类器训练。Step 3.3: Classifier training.

峰值干扰度ρ>0，即表明在步骤3.1中选取的峰值最大响应图对应的分类器与目标的匹配程度较差，需要训练新的分类器w_t'以适应目标的变化。The peak interference degree ρ>0 means that the classifier corresponding to the peak maximum response map selected in step 3.1 has a poor matching degree with the target, and a new classifier _wt ' needs to be trained to adapt to the change of the target.

式中，*为卷积运算符，λ为l2正则化参数，y是训练的目标标签函数，是一个尺寸与分类器大小相同的二维高斯函数，峰值位于中心处。where * is the convolution operator, λ is the l2 regularization parameter, y is the training target label function, a two-dimensional Gaussian function with the same size as the classifier, and the peak is at the center.

步骤4：基于自适应融合系数的分类器更新。Step 4: Classifier update based on adaptive fusion coefficients.

新的分类器参数w_t'训练好后，对记忆空间中的分类器进行更新。分类器w_t-1[π]与w_t'进行加权融合，其余分类器参数不变，用公式描述为：After the new classifier parameters w _t ' are trained, the classifier in the memory space is updated. The classifier w _t-1 [π] is weighted and fused with w _t ', and the rest of the classifier parameters remain unchanged, which is described by the formula as:

λ关于ρ单调递增，使得目标的变化越剧烈，分类器更新的速率越快。λ increases monotonically with respect to ρ, so that the more drastic the target changes, the faster the classifier updates.

本发明的仿真效果通过下述仿真实验说明：The simulation effect of the present invention is illustrated by the following simulation experiments:

1.仿真条件：1. Simulation conditions:

本发明在Intel(R)Core(TM)i7-7700HQ CPU 2.80GHz,RAM 8.00G,GTX1050GPU的PC上，使用MATLAB 2017b平台，对Visual Tracker Benchmark视频测试集中的视频序列完成仿真实验。The present invention uses the MATLAB 2017b platform on the PC of Intel(R) Core(TM) i7-7700HQ CPU 2.80GHz, RAM 8.00G, GTX1050GPU to complete the simulation experiment on the video sequences in the Visual Tracker Benchmark video test set.

2.仿真结果：2. Simulation results:

图3是对目标存在明显遮挡的的视频序列跟踪结果图，分别是第330、371、390和410帧，图中的矩形方框表示常规方法和本发明方法跟踪到的结果。从图3可以看出，在运动目标存在明显遮挡后又重新出现的过程中，本发明可以对目标进行准确跟踪。Fig. 3 is a graph of the tracking result of a video sequence with obvious occlusion of the target, which are the 330th, 371st, 390th and 410th frames respectively. The rectangular boxes in the figure represent the tracking results of the conventional method and the method of the present invention. It can be seen from FIG. 3 that the present invention can accurately track the moving target in the process of reappearing after obvious occlusion.

图4是本发明方法与常规HCF算法跟踪精度曲线对比图。跟踪精度曲线的横坐标是指仿真跟踪结果的目标中心和groundtruth中标注的真实中心的欧氏距离，纵坐标是指该欧式距离小于某一阈值的帧数占整个测试视频序列长度的比例。图5是在距离阈值为20像素处的跟踪精确度与跟踪速度(FPS：每秒帧数)对比图。经评估统计，对于Lemming序列，常规HCF算法和本发明方法的跟踪结果与目标实际位置的距离在20像素内的概率分别为0.6820和0.8920，其跟踪精度提高了30.8％。当CNN运算在GPU上完成时，常规HCF算法和本发明提出的算法的速度分别为4.4751fps和5.1678fps，提高了15.5％；当CNN运算在CPU上完成时，两种算法的速度分别为1.1653fps和2.1363fps，提高了83.3％。FIG. 4 is a comparison diagram of the tracking accuracy curve between the method of the present invention and the conventional HCF algorithm. The abscissa of the tracking accuracy curve refers to the Euclidean distance between the target center of the simulation tracking result and the real center marked in the groundtruth, and the ordinate refers to the ratio of the number of frames whose Euclidean distance is less than a certain threshold to the length of the entire test video sequence. Figure 5 is a graph of tracking accuracy versus tracking speed (FPS: frames per second) at a distance threshold of 20 pixels. According to the evaluation statistics, for the Lemming sequence, the probability that the distance between the tracking result of the conventional HCF algorithm and the method of the present invention and the actual position of the target within 20 pixels is 0.6820 and 0.8920, respectively, and the tracking accuracy is improved by 30.8%. When the CNN operation is completed on the GPU, the speed of the conventional HCF algorithm and the algorithm proposed by the present invention are 4.4751fps and 5.1678fps, an increase of 15.5%; when the CNN operation is completed on the CPU, the speed of the two algorithms is 1.1653 fps and 2.1363fps, an improvement of 83.3%.

Claims

1. A correlation filtering moving target tracking method based on a memory mechanism and convolution characteristics is characterized by comprising the following steps:

firstly, initializing a memory space, and the method comprises the following steps:

setting the capacity of a memory space as m, filling the memory space when the frames 1 to m are processed, not executing a memory mechanism for the moment, storing parameters of a classifier into the memory space after the training of the classifier is completed in the ith frame, using the parameters as an ith classifier w [ i ], wherein i belongs to { 1., m } in the memory space, and starting executing the memory mechanism in a subsequent frame when the memory space is filled;

secondly, extracting the convolution characteristics of the target by utilizing a pre-trained deep convolutional neural network, and integrating a memory mechanism into the detection, training and updating fusion process of a classifier of a related filtering method, wherein the memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficients; specifically, a memory mechanism is integrated into a detection, training and updating fusion process of a classifier of a relevant filtering method, and the method comprises the following steps:

the classifier detection based on the response graph decision comprises the following steps: after extracting the convolution characteristics of the candidate region, carrying out convolution operation on all classifiers in the memory space with the candidate region to obtain respective response graphs, and selecting the response graph with the maximum peak value to position the target;

the classifier training based on the self-adaptive peak detection comprises the following steps:

step A: self-adaptive peak detection;

meanwhile, the position and peak value size relation of the main peak and the interference peak on the response graph is calculated and compared, and the secondary peak except the main peak is selected as the interference peak; when the interference peak is far away from the main peak, even if the interference peak is high, the target is considered not to be shielded, and when the interference peak is close to the main peak, the target is judged to be shielded even if the interference peak is not high;

and judging the target state by using the peak interference degree, wherein the formula is as follows:

wherein, the response diagram uses the center of the main peak as the origin to define the coordinate system again, H is the peak value of the main peak on the response diagram, H is the peak value of the interference peak, M is the distance from the main peak to the edge of the response diagram in the direction of the interference peak,

is the position vector of the interference peak relative to the main peak,

for a constructed paraboloid, if the interference peak is higher than the curved surface, the target is considered to be changed violently; the rho value is the ratio of the distance of the interference peak exceeding the curved surface to the height of the whole interference peak, if the peak value interference degree rho is 0, the subsequent steps are skipped, the training and updating of the classifier are not carried out, and the next frame is directly entered; when peak interference degree rho>0, executing the following steps:

and B: extracting convolution characteristics of a current frame target area;

according to the positioning result of the current frame in the classifier detection process, a target center is used as a center, a target area with the same size as a subsequent area is obtained through expansion, the target area is input into a convolutional neural network, and the convolutional characteristic x of the target area is extracted_t'；

And C: training a classifier;

peak interference degree rho>0, indicating that the degree of matching between the classifier corresponding to the peak maximum response graph selected in the step A and the target is poor, and a new classifier w needs to be trained_t' to accommodate changes in goals;

training classifier parameters w corresponding to the l-th layer features by minimizing the following formula_t'[l]：

Wherein, x'_t[l]Features extracted at new positions during training are convolution operators, and lambda is a l2 regularization parameter; y is a trained target label function and is a two-dimensional Gaussian function with the same size as the classifier, and the peak value is positioned at the center;

the closed-form solution to this minimization problem is:

wherein,

a Fourier transform form representing a target tag function;

updating a classifier based on a self-adaptive fusion coefficient, wherein the method comprises the following steps: after a new classifier is trained, a fusion coefficient is calculated in a self-adaptive mode according to a peak detection result, and the more severe the interference is, the larger the fusion coefficient is.

2. The method for tracking the moving object based on the correlation filtering of the memory mechanism and the convolution characteristic as claimed in claim 1, wherein the classifier detection based on the response graph decision is as follows:

step 2.1: extracting convolution characteristics of current frame candidate region

Setting the memory space capacity as m; reading the t-th frame image, t>m, selecting a candidate area according to the target center position determined by the previous frame; extracting the convolution characteristic of the tracking window by means of a pre-trained convolution neural network; after the subsequent region image is input into the convolutional neural network, the output of L layers in 19 convolutional layers is selected as the convolutional characteristic x_tAnd the characteristic of the candidate region at the ith layer at the moment t is represented as x_t[l]，l∈L；

Extracting convolution features x_tThen, with x_tConstructing a circulant matrix for the generated matrix to obtain a test sample C (x)_t)；

Step 2.2: detecting all classifiers in a memory space

Let w_t-1[i,l]Parameters representing the ith classifier learned before the tth frame in memory space corresponding to the ith layer features, i ∈ { 1., m }, L ∈ L, are used to detect the sample C (x)_t) Convolving with a classifier to obtain a response graph, and regarding the position of the maximum response value on the response graph as a target position;

the response f of each layer characteristic_t[i,l]Adding according to fixed weight to obtain the response image f of the ith classifier in the memory space at the t frame_t[i]：

Wherein,

denotes an Inverse Fast Fourier Transform (IFFT) operation, which is a dot product operator, capital letters denote Fourier transform forms of variables, and γ is a fusion weight; x_t[l]A Fourier transform form representing the characteristics of the l < th > layer at the t < th > frame;

performing convolution operation on all classifiers in the memory space and the cyclic sample to obtain m response graphs, estimating a target position by taking the response graph with the maximum response peak value, and performing subsequent training and updating on the classifier corresponding to the response graph:

in the formula, pi is the index of the classifier corresponding to the maximum peak response graph in the memory space.

3. The method for tracking the moving object based on the correlation filtering of the memory mechanism and the convolution characteristic as claimed in claim 1, wherein the classifier updating method based on the adaptive fusion coefficient is as follows:

new classifier parametersNumber w_tAfter training, updating the classifier in the memory space; classifier w_t-1[π]And w_t' carry out weighted fusion, and the parameters of the rest classifiers are unchanged, and the formula is as follows:

wherein λ is a fusion coefficient of the classifier at the current frame, and is obtained by using a Sigmoid function in a self-adaptive manner:

wherein λ monotonically increases with respect to ρ such that the more drastic the change in the target, the faster the rate of classifier update; e is a natural log symbol.