CN106803265A

CN106803265A - Multi-object tracking method based on optical flow method and Kalman filtering

Info

Publication number: CN106803265A
Application number: CN201710011320.0A
Authority: CN
Inventors: 邓欣; 石龙伟; 陈乔松; 王进; 李丹妮; 高峰星
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2017-01-06
Filing date: 2017-01-06
Publication date: 2017-06-06

Abstract

The invention discloses a multi-target tracking method based on an optical flow method and a Kalman filter. First, an input video frame is processed through the optical flow method. Secondly, outliers are removed by optical flow clustering. Then, through morphological expansion and improved median filtering, the accurate acquisition of moving targets can be realized. Finally, according to the obtained target information, the Kalman filter method is used to process the subsequent image sequence and predict the moving target, so as to realize the effective tracking of the moving target. The invention does not need to train classifiers and construct target templates, and can better mark moving targets with optical flow information after clustering processing.

Description

Multi-target Tracking Method Based on Optical Flow and Kalman Filter

技术领域technical field

本发明涉及一种目标跟踪算法，尤其涉及一种基于光流法和卡尔曼滤波的多目标跟踪方法，属于计算机视觉领域。The invention relates to a target tracking algorithm, in particular to a multi-target tracking method based on an optical flow method and a Kalman filter, and belongs to the field of computer vision.

背景技术Background technique

视觉是人类从自然环境中获取信息的主要手段之一，在可见光的照射下，周围环境在人眼的视网膜上形成图像，经感光细胞转换为神经脉冲信号并通过神经纤维传递至大脑皮层进行处理和分析。而计算机视觉则是计算机模拟人眼的视觉功能，从图像中提取有效信息，对环境中的背景和目标进行形态学和运动学方面的分析。基于图像处理技术的目标跟踪是当前计算机视觉领域的前沿课题之一，其研究内容涉及图像分析、计算机视觉、模式识别、机器学习、自动化控制等多个领域。同时，目标跟踪有着非常广阔的应用前景，主要涉及智能视频监控、智能车辆辅助驾驶、视觉导航、虚拟现实、武器制导等方面。因此，对目标跟踪的研究具有很重要的理论和实际意义。Vision is one of the main means for humans to obtain information from the natural environment. Under the irradiation of visible light, the surrounding environment forms an image on the retina of the human eye, which is converted into nerve impulse signals by photoreceptor cells and transmitted to the cerebral cortex through nerve fibers for processing. and analysis. Computer vision is a computer simulation of the visual function of the human eye, extracting effective information from images, and analyzing the morphology and kinematics of the background and targets in the environment. Target tracking based on image processing technology is one of the frontier topics in the field of computer vision, and its research involves image analysis, computer vision, pattern recognition, machine learning, automatic control and other fields. At the same time, target tracking has a very broad application prospect, mainly involving intelligent video surveillance, intelligent vehicle assisted driving, visual navigation, virtual reality, weapon guidance and so on. Therefore, the research on target tracking has very important theoretical and practical significance.

随着计算机和各种多媒体设备的普及为研究者们使用各种图像采集设备获取外界环境信息并转换成便于存储的数字信号提供了便利，从而使得电脑可以像人脑一样处理各种图像信息，进而代替人类完成一些与视觉信息相关的工作。目标跟踪是指通过确定目标在视频图像中的位置信息，对目标的运动状态进行估计，同时预测其在下一帧中的运动特征，进而根据所获取的特征对视频中的目标进行匹配，从而达到连续准确跟踪目标的目的。就当前目标跟踪的方法而言，根据是否依赖先验知识可以分为两大类，即依赖目标的先验知识和不依赖目标的先验知识。前者需要为目标建模，然后在后续图像序列中进行模板匹配，进而找到感兴趣目标。而后者则可以直接从图像序列中检测到运动目标，结合相应的搜索算法进而实现对运动目标的有效跟踪。With the popularization of computers and various multimedia devices, it is convenient for researchers to use various image acquisition devices to obtain external environmental information and convert them into digital signals that are easy to store, so that computers can process various image information like the human brain. And then replace humans to complete some work related to visual information. Target tracking refers to estimating the motion state of the target by determining the position information of the target in the video image, and predicting its motion characteristics in the next frame at the same time, and then matching the target in the video according to the acquired features, so as to achieve The purpose of continuous and accurate tracking of the target. As far as the current target tracking methods are concerned, they can be divided into two categories according to whether they rely on prior knowledge, namely, target-dependent prior knowledge and target-independent prior knowledge. The former needs to model the target, and then perform template matching in the subsequent image sequence to find the target of interest. The latter can directly detect the moving target from the image sequence, combined with the corresponding search algorithm to achieve effective tracking of the moving target.

发明内容Contents of the invention

鉴于以上缺陷，本发明研究了光流法和卡尔曼滤波。其中，光流法是一种对真实运动场的近似估计，在比较理想的情况下，它不需要预先知道场景的任何信息就能够检测独立运动的目标。此外，由于连续帧的时间间隔很短，各个运动目标的运动可近似为匀速运动，而且卡尔曼滤波是一种使用状态方程和观测方程来对系统的状态序列进行线性最小方差估计的算法。它基于系统以前的状态序列对下一个状态作最优估计，预测时具有无偏、稳定、最优、计算量小等特点，并且可以准确地预测目标的位置和速度。因此，本发明结合基于图像金字塔的Lucas-Kanade光流法和卡尔曼滤波两种方法的优势将它们加以融合用于目标跟踪，过程中使用改进的中值滤波算法对获取的图像进行滤波处理以便更准确的获取所跟踪目标。In view of the above defects, the present invention studies the optical flow method and Kalman filter. Among them, the optical flow method is an approximate estimate of the real motion field. Under ideal conditions, it can detect independently moving targets without knowing any information about the scene in advance. In addition, because the time interval between consecutive frames is very short, the motion of each moving target can be approximated as a uniform motion, and Kalman filtering is an algorithm that uses state equations and observation equations to perform linear minimum variance estimation on the state sequence of the system. It makes an optimal estimate for the next state based on the previous state sequence of the system. It has the characteristics of unbiased, stable, optimal, and small amount of calculation when predicting, and can accurately predict the position and speed of the target. Therefore, the present invention combines the advantages of the Lucas-Kanade optical flow method based on the image pyramid and the Kalman filtering method to fuse them for target tracking, and uses an improved median filtering algorithm to filter the acquired images in the process so that Obtain the tracked target more accurately.

为了实现上述目的本发明采用了如下技术方案：基于光流法和卡尔曼滤波的多目标跟踪方法，包括以下步骤：In order to achieve the above object, the present invention adopts the following technical scheme: the multi-target tracking method based on optical flow method and Kalman filter, comprising the following steps:

(1)读取视频帧，采用基于图像金字塔的L-K光流法计算每帧图像的光流。(1) Read the video frame, and use the L-K optical flow method based on the image pyramid to calculate the optical flow of each frame of image.

(2)对光流进行聚类处理得到若干光流类，获得标有光流信息的光流检测效果图。(2) Perform clustering processing on the optical flow to obtain several optical flow classes, and obtain an optical flow detection effect map marked with optical flow information.

(3)对光流检测效果图采用改进的中值滤波方法进行去噪。(3) The improved median filter method is used to denoise the optical flow detection effect map.

(4)光流类的个数为运动目标的个数，将这些运动目标的信息传递给卡尔曼滤波进行目标跟踪处理。(4) The number of optical flow classes is the number of moving targets, and the information of these moving targets is passed to the Kalman filter for target tracking processing.

在上述方案中，所述基于图像金字塔的L-K光流法的具体过程包括：在最高一层的图像上计算得出光流；将计算的结果作为下一层图像的初始值，在这个初始值的基础上计算本层的光流；重复这一过程，直到传递给最后一层，即原始图像层。In the above scheme, the specific process of the L-K optical flow method based on the image pyramid includes: calculating the optical flow on the image of the highest layer; using the result of the calculation as the initial value of the image of the next layer, and in this initial value Calculate the optical flow of this layer on the basis; repeat this process until it is passed to the last layer, the original image layer.

在本发明的具体实施例中，所述出光流的计算方法为：In a specific embodiment of the present invention, the calculation method of the light flow is:

在一个以a点为中心的局部邻域上定义以下函数，并使该函数值最小：Define the following function on a local neighborhood centered on point a, and minimize the function value:

其中，Ω表示点a的局部邻域，W(x,y)表示权函数，表示图像在点a处的梯度，V_a表示点a的光流，I_t表示点a＝(x,y)的灰度值为I＝(x,y,t)图像在t时刻的时域导数。Among them, Ω represents the local neighborhood of point a, W(x, y) represents the weight function, Represents the gradient of the image at point a, V _a represents the optical flow of point a, I _t represents the gray value of point a=(x, y) I=(x, y, t) the time domain of the image at time t Derivative.

对上式进行最优化求解可得：By optimizing the above formula, we can get:

表示包含n个点的梯度的列向量； A column vector representing the gradient of n points;

W＝diag[W(x₁),W(x₂),...,W(x_n)]表示包含n个点的权值的对角矩阵；W=diag[W(x ₁ ),W(x ₂ ),...,W(x _n )] represents a diagonal matrix containing weights of n points;

b＝-[I_t(x₁),I_t(x₂),...,I_t(x_n)]^Τ表示包含t时刻n个点的时域导数的向量；b=-[I _t (x ₁ ), I _t (x ₂ ),...,I _t (x _n )] ^Τ represents a vector comprising time-domain derivatives of n points at time t;

V＝[A^ΤW²A]^-1 A^ΤW²b表示所求的光流信息；V=[A ^Τ W ² A] ^-1 A ^Τ W ² b represents the optical flow information sought;

表示图像在第n个点处的梯度，W(x_n)表示图像在第n个点处的窗口权重，I_t(x_n)图像第n个点的灰度值在t时刻的时域导数，其中x_i∈Ω且i＝1,2,3...n。 Represents the gradient of the image at the nth point, W(x _n ) represents the window weight of the image at the nth point, I _t (x _n ) the temporal derivative of the gray value of the nth point of the image at time t , where x _i ∈Ω and i=1,2,3...n.

所述聚类处理包括：设定一个阈值D_th，比较两个光流矢量的度量函数D，若此度量值小于阈值则将这两个光流合并，作为一个光流类，计算该光流类的平均光流；再将其余光流矢量与平均光流的度量函数与阈值比较，小于阈值则将该光流并入该光流类中。The clustering process includes: setting a threshold D _th , comparing the metric function D of two optical flow vectors, if the metric value is less than the threshold, merging the two optical flows as an optical flow class, and calculating the optical flow The average optical flow of the class; then compare the other optical flow vectors with the average optical flow metric function and the threshold value, and if it is less than the threshold value, the optical flow will be incorporated into the optical flow class.

在同一光流类中，若某点(x_i,y_i)距离本光流类中心的距离d_i≥(μ+2σ)，则删除该点。In the same optical flow class, if the distance d _i ≥ (μ+2σ) from a point ( _xi , y _i ) to the center of this optical flow class, delete this point.

d_i＝(x_i-c_x)²+(y_i-c_y)² d _i ＝(x _i -c _x ) ² +(y _i -c _y ) ²

n表示同一光流类中像素点的总数。此外，μ是均值也叫期望，同时是正态分布的位置参数，描述正态分布的集中趋势位置，正态分布以x＝μ对称。σ表示随机变量的标准差，也表示正态分布数据的分散程度。σ越大表示数据越分散，σ越小表示数据越集中。同时，σ也是正态分布曲线的形状参数，其值越大，分布曲线也就越扁平，其值越小，曲线也就越高。n represents the total number of pixels in the same optical flow class. In addition, μ is the mean value, also called expectation, and it is also the position parameter of the normal distribution, which describes the central tendency position of the normal distribution, and the normal distribution is symmetrical with x=μ. σ represents the standard deviation of a random variable and also represents the degree of dispersion of normally distributed data. The larger the σ, the more dispersed the data, and the smaller the σ, the more concentrated the data. At the same time, σ is also the shape parameter of the normal distribution curve, the larger the value, the flatter the distribution curve, and the smaller the value, the higher the curve.

上述度量函数D为The above measurement function D is

(u₁,v₁)和(u₂,v₂)分别表示两个光流矢量值。(u ₁ , v ₁ ) and (u ₂ , v ₂ ) denote two optical flow vector values respectively.

具体地，所述步骤(3)中还包括以下步骤：将光流检测效果图转化为灰度图；通过改进的中值滤波对灰度图像进行去噪；将去噪后的图像转化为二值化图像；通过膨胀处理和形态学闭的操作获取最终的运动目标。Specifically, the step (3) also includes the following steps: converting the optical flow detection effect map into a grayscale image; denoising the grayscale image through an improved median filter; converting the denoised image into a binary image Valued images; the final moving target is obtained through expansion processing and morphological closing operations.

进一步，所述改进的中值滤波采用快速的并行中值滤波方法：对滑动窗口内的每一列或每一行分别计算最大值、中值和最小值后得到滑动窗口行数或者列数组数据；滑动窗口的最大值是最大值组中的最大值，滑动窗口的最小值则是最小值组中的最小值，滑动窗口的中值是最大值组中的最小值、中值组中的中值和最小值组中的最大值这三个数的中间值。Further, the improved median filtering adopts a fast parallel median filtering method: after calculating the maximum value, median value and minimum value for each column or row in the sliding window, the row number or column array data of the sliding window is obtained; The maximum value of the window is the maximum value in the maximum value group, the minimum value of the sliding window is the minimum value in the minimum value group, and the median value of the sliding window is the minimum value in the maximum value group, the median value in the median value group, and The middle value of the three numbers that is the maximum value in the minimum group.

本发明的优点：其一，采用了非依赖目标先验知识的方法，所以无需训练分类器和构建目标模板。其二，光流法可以很好地捕获运动目标，通过聚类处理后能够更好的用光流信息标记运动目标。其三，通过改进中值滤波的处理，能够很好的滤出部分噪声，而且比传统中值滤波速度快了近2倍。最后就是结合卡尔曼滤波的预测和修正过程的不断迭代从而达到动态平衡且迭代速度较快，而且可达到很好的跟踪效果。The advantages of the present invention are as follows: firstly, a method that does not depend on the prior knowledge of the target is adopted, so there is no need to train a classifier and construct a target template. Second, the optical flow method can capture moving objects well, and can better mark moving objects with optical flow information after clustering. Third, by improving the processing of the median filter, part of the noise can be filtered out very well, and it is nearly 2 times faster than the traditional median filter. The last is to combine the continuous iteration of the prediction and correction process of Kalman filter to achieve dynamic balance and fast iteration speed, and can achieve good tracking effect.

附图说明Description of drawings

图1为本发明具体流程图；Fig. 1 is a specific flow chart of the present invention;

图2为运动目标经处理后的光流图；Figure 2 is the processed optical flow diagram of the moving target;

图3为中值滤波效果；Figure 3 is the median filtering effect;

图4为部分跟踪效果图。Figure 4 is a partial tracking effect diagram.

具体实施方式detailed description

下面结合附图对本发明的实施方案进行详细说明。Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

本发明方法算法流程如图1所示。具体实施流程如下：The algorithm flow of the method of the present invention is shown in FIG. 1 . The specific implementation process is as follows:

步骤1：使用基于图像金字塔的Lucas-Kanade光流法处理读入的视频帧。光源照射在物体的表面，会使其表面的灰度呈现一定的空间分布，人们在观察运动物体时会在视网膜上形成连续变化的图像即为光流。光流表达图像的变化，包含了运动目标的信息，可用来确定目标的运动。Step 1: Process the read video frames using the image pyramid-based Lucas-Kanade optical flow method. When the light source shines on the surface of the object, the gray scale of the surface will show a certain spatial distribution. When people observe a moving object, a continuously changing image will be formed on the retina, which is the optical flow. Optical flow expresses the change of the image, contains the information of the moving target, and can be used to determine the movement of the target.

L-K光流首先假设在一个小的空间邻域上运动矢量保持恒定，然后使用加权最小二乘法来估计光流场。假设在t时刻点a＝(x,y)的灰度值为I＝(x,y,t)，可以导出光流约束方程：L-K optical flow first assumes that the motion vector remains constant over a small spatial neighborhood, and then uses weighted least squares to estimate the optical flow field. Assuming that the gray value of point a=(x,y) at time t is I=(x,y,t), the optical flow constraint equation can be derived:

其中，u和v是该点光流的水平和垂直分量，代表着目标的运动信息，对式(1)进行简化：in, u and v are the horizontal and vertical components of the optical flow at this point, which represent the motion information of the target, and formula (1) is simplified:

I_xu+I_yv+I_t＝0 (2)I _x u + I _y v + I _t = 0 (2)

I_x，I_y分别表示点a＝(x,y)的灰度值为I＝(x,y,t)在x轴和y轴方向的空域梯度场，I_t表示点a＝(x,y)的灰度值为I＝(x,y,t)图像在t时刻的时域导数。I _x , I _y represent the spatial gradient field of the gray value of point a=(x, y) I=(x, y, t) in the direction of x-axis and y-axis respectively, I _t represents point a=(x, The gray value of y) is the time-domain derivative of the I=(x, y, t) image at time t.

将式(2)写成式(3)的形式：用点a处的梯度向量和点a的光流向量V_a的点乘再加上I_t的和为零来表示：Write formula (2) in the form of formula (3): use the gradient vector at point a The dot product of the optical flow vector V _a of point a and the sum of I _t is zero to represent:

其中，表示图像在点a处的梯度；V_a＝(u,v)是点a的光流。假定在一个以a点为中心的局部邻域中，各点的光流相同，搜索位移量应使得与之对应的点的相关邻域的匹配误差最小，即在此邻域上定义函数为式(4)，并使其函数值最小：in, Represents the gradient of the image at point a; V _a =(u, v) is the optical flow of point a. Assuming that in a local neighborhood centered on point a, the optical flow of each point is the same, and the search displacement should minimize the matching error of the relevant neighborhood of the corresponding point, that is, define a function on this neighborhood as formula (4), and minimize its function value:

其中，Ω表示点a的局部邻域，W(x,y)表示窗口权函数，它使邻域中心部分对光流约束产生的影响比外围部分更大。对式(4)进行最优化求解可得：Among them, Ω represents the local neighborhood of point a, and W(x, y) represents the window weight function, which makes the central part of the neighborhood have a greater influence on the optical flow constraints than the peripheral part. By optimizing and solving formula (4), we can get:

若单独使用L-K光流法，虽然能够求得光流且对噪声不敏感，但当目标运动尺度过大或者出现遮挡等情况的时候，会因为运动尺度过大而产生较大的计算误差，不但会影响算法的准确度，同时也降低了整体的运算速度。对此，本发明采用了基于金字塔的L-K光流法，其原理描述如下：首先，在最高一层的图像上计算得出光流和仿射变换矩阵。下一层图像的初始值是上一层的计算结果，下一层的图像在这个初始值的基础上，计算本层的光流和仿射变换矩阵。然后重复这一过程，直到传递给最后一层，即原始图像层。最终结果为在原始图像层上计算出来的光流和仿射变换矩阵，最终实现由粗到精的筛选过程。If the L-K optical flow method is used alone, although the optical flow can be obtained and it is not sensitive to noise, when the target motion scale is too large or occlusion occurs, large calculation errors will occur due to the large motion scale, not only It will affect the accuracy of the algorithm and also reduce the overall computing speed. For this, the present invention adopts the L-K optical flow method based on the pyramid, and its principle is described as follows: first, the optical flow and the affine transformation matrix are calculated on the image of the highest layer. The initial value of the image of the next layer is the calculation result of the previous layer, and the image of the next layer calculates the optical flow and affine transformation matrix of this layer on the basis of this initial value. This process is then repeated until passed to the last layer, the original image layer. The final result is the optical flow and affine transformation matrix calculated on the original image layer, and finally realizes the screening process from coarse to fine.

步骤2：对光流进行聚类处理的依据是同一个运动目标上的光流近似且其分布呈现一定规律性，而不同目标上的光流不同，进而可以得到若干类。假设存在两个光流矢量值为(u₁,v₁)和(u₂,v₂)，设置衡量它们相似性的度量函数如下：Step 2: The basis for clustering the optical flow is that the optical flow on the same moving target is approximate and its distribution presents a certain regularity, while the optical flow on different targets is different, and several classes can be obtained. Assuming that there are two optical flow vector values (u ₁ , v ₁ ) and (u ₂ , v ₂ ), the metric function to measure their similarity is set as follows:

光流聚类可分为粗聚类和精聚类。粗聚类时需要设定一个小的阈值D_th，比较两个光流矢量的度量函数D，若此度量值小于设定阈值就将这两个光流合并，并计算该类的平均光流。再判断其他光流矢量是否并入某类中时，需衡量它们与该类平均光流的度量函数值的大小。另外，精聚类时需要处理已获取的若干光流类。然而，各种噪声的存在使得每类中可能存在一定的离群点，所以就有必要去掉离群点。假设某一光流类中的像素点如：(x₁,y₁),(x₂,y₂),...,(x_n,y_n)Optical flow clustering can be divided into coarse clustering and fine clustering. A small threshold D _th needs to be set for rough clustering, and the metric function D of the two optical flow vectors is compared. If the metric value is less than the set threshold, the two optical flows are merged, and the average optical flow of this class is calculated. . When judging whether other optical flow vectors are incorporated into a certain class, it is necessary to measure the size of the metric function value between them and the average optical flow of the class. In addition, several optical flow classes that have been obtained need to be processed during fine clustering. However, due to the existence of various noises, there may be some outliers in each class, so it is necessary to remove the outliers. Suppose the pixels in a certain optical flow class are: (x ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _n ,y _n )

由于同一类中的点到类中心的距离服从参数为(μ,σ)的正态分布，若某点(x_i,y_i)距离本类中心的距离d_i≥(μ+2σ)，则删除该点，其中各参数如下所示：Since the distance from the points in the same class to the center of the class obeys the normal distribution with parameters (μ, σ), if the distance d _i ≥ (μ+2σ) from a point ( _xi , y _i ) to the center of the class, then Delete the point, where the parameters are as follows:

d_i＝(x_i-c_x)²+(y_i-c_y)² d _i ＝(x _i -c _x ) ² +(y _i -c _y ) ²

n表示同一类中像素点的总数。此外，μ是均值也叫期望，同时是正态分布的位置参数，描述正态分布的集中趋势位置，正态分布以x＝μ对称。σ表示随机变量的标准差，也表示正态分布数据的分散程度。σ越大表示数据越分散，σ越小表示数据越集中。同时，σ也是正态分布曲线的形状参数，其值越大，分布曲线也就越扁平，其值越小，曲线也就越高。n represents the total number of pixels in the same class. In addition, μ is the mean value, also called expectation, and it is also the position parameter of the normal distribution, which describes the central tendency position of the normal distribution, and the normal distribution is symmetrical with x=μ. σ represents the standard deviation of a random variable and also represents the degree of dispersion of normally distributed data. The larger the σ, the more dispersed the data, and the smaller the σ, the more concentrated the data. At the same time, σ is also the shape parameter of the normal distribution curve, the larger the value, the flatter the distribution curve, and the smaller the value, the higher the curve.

运动目标经处理后能被光流信息所明确标记，效果图如图2所示。The moving target can be clearly marked by the optical flow information after processing, and the effect diagram is shown in Figure 2.

步骤3：图像处理技术具有很强的针对性，由于应用的不同、需求的不同，处理方法也就不同。此外，影响图像变化的因素很多，如：天气、光照、摄像机抖动、各种噪声等，所以就有必要对图像进行去噪。本发明选择了中值滤波方法进行去噪。首先，在此过程中并非是对光流进行处理，而是对光流检测效果图(如图2所示)的图像转化为灰度图。其次，通过中值滤波对灰度图像进行去噪。然后将去噪后的图像转化为二值化图像。最后，通过膨胀处理和形态学闭的操作来获取最终的目标。在一定条件下，中值滤波可以处理线性滤波器产生的图像细节模糊问题，特别是对扫描噪声或滤波脉冲的处理有非常好的效果。Step 3: The image processing technology is highly targeted, and the processing methods are different due to different applications and requirements. In addition, there are many factors that affect image changes, such as: weather, light, camera shake, various noises, etc., so it is necessary to denoise the image. The present invention selects the median filtering method for denoising. First of all, in this process, instead of processing the optical flow, the image of the optical flow detection effect map (as shown in Figure 2) is converted into a grayscale image. Second, the grayscale image is denoised by median filtering. The denoised image is then converted into a binarized image. Finally, the final goal is obtained through dilation processing and morphological closing operations. Under certain conditions, the median filter can deal with the fuzzy image details produced by the linear filter, especially for the processing of scanning noise or filtered pulses.

传统的中值滤波采用含有奇数点的滑动窗口，用窗口中各点灰度值的中值来替代指定点的灰度值。对于奇数元素，中值为大小排序后中间的数值，对于偶数元素，中值为排序后中间两个元素灰度值的平均值，而排序算法又是中值滤波的关键。传统算法使用冒泡排序法，假设窗口中像素点的个数为m，则每个窗口需要比较m(m-2)/2次，时间复杂度为O(m²)。此外，每移动一次窗口都进行一次排序，当图像的大小为N*N时，时间复杂度则为O(m²N²)。当图像较大时，计算量很大，较为费时。对此，本发明使用一种快速的并行中值滤波方法。为便于说明，假设一个3*3的窗口，其内部各个像素如表1所示：The traditional median filter uses a sliding window with odd points, and replaces the gray value of the specified point with the median value of the gray value of each point in the window. For odd-numbered elements, the median value is the middle value after size sorting. For even-numbered elements, the median value is the average value of the gray values of the middle two elements after sorting, and the sorting algorithm is the key to median filtering. The traditional algorithm uses the bubble sorting method. Assuming that the number of pixels in the window is m, each window needs to be compared m(m-2)/2 times, and the time complexity is O(m ² ). In addition, sorting is performed every time the window is moved, and when the size of the image is N*N, the time complexity is O(m ² N ² ). When the image is large, the amount of calculation is large and time-consuming. For this, the present invention uses a fast parallel median filtering method. For the sake of illustration, assume a 3*3 window, and its internal pixels are shown in Table 1:

表1 3*3窗口内各像素Table 1 Each pixel in the 3*3 window

像素pixel 第1列column 1 第2列column 2 第3列column 3 第1行line 1 第2行line 2 第3行line 3

对窗口内的每一列分别计算最大值、中值和最小值后得到了3组数据。其中，max表示取最大值，med表示取中值，min表示取最小值。After calculating the maximum value, median value and minimum value for each column in the window, three sets of data are obtained. Among them, max means to take the maximum value, med means to take the median value, and min means to take the minimum value.

最大值组：Max1＝max[F₀,F₃,F₆]，Max2＝max[F₁,F₄,F₇]，Max3＝max[F₂,F₅,F₈]Maximum group: Max1=max[F ₀ ,F ₃ ,F ₆ ], Max2=max[F ₁ ,F ₄ ,F ₇ ], Max3=max[F ₂ ,F ₅ ,F ₈ ]

中值组：Med1＝med[F₀,F₃,F₆]，Med2＝med[F₁,F₄,F₇]，Med3＝med[F₂,F₅,F₈]Median group: Med1 = med[F ₀ , F ₃ , F ₆ ], Med2 = med[F ₁ , F ₄ , F ₇ ], Med3 = med[F ₂ , F ₅ , F ₈ ]

最小值组：Min1＝min[F₀,F₃,F₆]，Min2＝min[F₁,F₄,F₇]，Min3＝min[F₂,F₅,F₈]Minimum value group: Min1=min[F ₀ ,F ₃ ,F ₆ ], Min2=min[F ₁ ,F ₄ ,F ₇ ], Min3=min[F ₂ ,F ₅ ,F ₈ ]

此滑动窗口中的最大值是最大值组中的最大值，最小值则是最小值组中的最小值。而中值组中的最小值至少小于5个像素值，中值组中的最大值至少大于5个像素值。此时，剩下3个要比较的像素：最大值组中的最小值、中值组中的中值、最小值组中的最大值，而这三个数的中间值即为该窗口的中值，其表达式如下：The maximum value in this sliding window is the maximum value in the maximum value group, and the minimum value is the minimum value in the minimum value group. And the smallest value in the median group is at least smaller than 5 pixel values, and the largest value in the median group is at least larger than 5 pixel values. At this point, there are 3 pixels to be compared: the minimum value in the maximum value group, the median value in the median value group, and the maximum value in the minimum value group, and the median value of these three numbers is the median value of the window. value, whose expression is as follows:

MininMax＝min[Max1,Max2,Max3]MininMax＝min[Max1,Max2,Max3]

MedinMed＝med[Med1,Med2,Med3]MedinMed = med[Med1, Med2, Med3]

MaxinMin＝max[Min1,Min2,Min3]MaxinMin=max[Min1,Min2,Min3]

Med＝med[MininMax,MedinMed,MaxinMin]Med = med[MininMax, MedinMed, MaxinMin]

该中值滤波方法比传统算法的运算速度提高了近2倍，经处理后的效果图如图3所示，图中白色连通区域为运动目标，黑色区域为背景区域。Compared with the traditional algorithm, the calculation speed of this median filtering method is increased by nearly 2 times. The effect diagram after processing is shown in Figure 3. The white connected area in the figure is the moving target, and the black area is the background area.

步骤4：经上述过程处理后，通过光流聚类，得到类的个数为运动目标的个数。得到类中光流矢量的均值为运动目标的速度。将检测到的运动目标信息传递给卡尔曼滤波进行后续的跟踪处理。卡尔曼滤波以状态方程和观测方程为基础，运用递归方法来预测线性系统变化的方法。其状态方程和观测方程如下：x_k＝A_k,k-1x_k-1+ξ_k-1 (6)Step 4: After the above process, through optical flow clustering, the number of classes obtained is the number of moving objects. The average value of the optical flow vector in the class is obtained as the speed of the moving target. Pass the detected moving target information to the Kalman filter for subsequent tracking processing. Kalman filtering is based on state equations and observation equations, and uses recursive methods to predict linear system changes. Its state equation and observation equation are as follows: x _k ＝A _k,k-1 x _k-1 +ξ _k-1 (6)

z_k＝H_kz_k+η_k (7)z _k ＝H _k z _k +η _k (7)

其中，x_k-1和x_k分别表示k-1时刻和k时刻的状态向量；z_k表示k时刻的观测向量；A_k,k-1表示从k-1时刻到k时刻的状态转移矩阵；H_k表示观测矩阵；ξ_k或ξ_k-1表示不同时刻系统噪声，且ξ_k∈N(0,Q_k)；η_k表示观测噪声，且η_k∈N(0,R_k)；Q_k和R_k分别是系统噪声ξ_k和观测噪声η_k的方差。卡尔曼滤波可概述为状态预测过程和状态修正过程，涉及如下部分：Among them, x _k-1 and x _k represent the state vectors at time k-1 and time k respectively; z _k represents the observation vector at time k; A _k,k-1 represents the state transition matrix from time k-1 to time k ; H _k represents the observation matrix; ξ _k or ξ _k-1 represents the system noise at different times, and ξ _k ∈ N(0, Q _k ); η _k represents the observation noise, and η _k ∈ N(0, R _k ); Q _k and R _k are the variances of system noise ξ _k and observation noise η _k , respectively. Kalman filtering can be summarized as a state prediction process and a state correction process, involving the following parts:

状态向量预测方程： State vector prediction equation:

误差协方差预测方程： Error covariance prediction equation:

卡尔曼滤波增益： Kalman filter gain:

修正状态向量： Corrected state vector:

修正误差协方差矩阵：P_k＝P_k,k-1-K_kH_kP_k,k-1 (12)Corrected error covariance matrix: P _k ＝P _k,k-1 -K _k H _k P _k,k-1 (12)

其中式(8)和式(9)为状态预测过程，式(9)、式(10)及式(11)为状态修正过程。预测以状态方程(6)为基础，求取状态预测向量和误差协方差预测向量P_k,k-1。而修正则以观测方程(7)为基础，修正状态预测向量，得到向量并通过计算求出最小误差协方差矩阵。Among them, formula (8) and formula (9) are the state prediction process, and formula (9), formula (10) and formula (11) are the state correction process. The prediction is based on the state equation (6), and the state prediction vector is obtained and error covariance prediction vector P _k,k-1 . The correction is based on the observation equation (7), and the state prediction vector is corrected to obtain the vector And calculate the minimum error covariance matrix.

视频中相邻帧的目标之间运动可近似为匀速运动，运动学公式如下，其中△t为相邻帧的时间间隔：The motion between objects in adjacent frames in the video can be approximated as uniform motion, and the kinematics formula is as follows, where △t is the time interval between adjacent frames:

S_k＝S_k-1+△tV_k-1 (13)S _k ＝S _k-1 +△tV _k-1 (13)

V_k＝V_k-1 (14) _{Vk =} _Vk-1 (14)

S_k表示目标经过时间间隔△t后的位移，V_k表示目标当前的运动速度，由于是近似匀速运动，所以每一时刻速度基本一致。S _k represents the displacement of the target after the time interval △t, and V _k represents the current speed of the target. Since it is approximately uniform motion, the speed is basically the same at each moment.

卡尔曼滤波的状态向量如下：The state vector of the Kalman filter is as follows:

其中，x(k)和y(k)是运动目标的中心点坐标，V_x(k)和V_y(k)是目标中心在两个坐标轴方向上的速度分量。由此，其状态转移矩阵如下：Among them, x(k) and y(k) are the coordinates of the center point of the moving target, and V _x (k) and V _y (k) are the velocity components of the target center in the direction of the two coordinate axes. Therefore, its state transition matrix is as follows:

根据目标中心点的坐标信息可以得到观测向量和观测矩阵如下，x_z(k)和y_z(k)表示第k帧中标记目标的矩形框的中心点坐标：According to the coordinate information of the target center point, the observation vector and observation matrix can be obtained as follows, x _z (k) and y _z (k) represent the coordinates of the center point of the rectangular frame marking the target in the kth frame:

将式(15)至(18)带入式(6)和(7)整理后便可重新描述状态方程和观测方程。其中ξ_k-1和η_k分别为4*1维和2*1维的系统噪声和观测噪声，它们也都是均值为0的高斯白噪声。系统噪声和观测噪声的协方差矩阵Q和R分别设定如下：After putting equations (15) to (18) into equations (6) and (7), the state equation and observation equation can be re-described. Among them, ξ _k-1 and η _k are 4*1-dimensional and 2*1-dimensional system noise and observation noise, respectively, and they are also Gaussian white noise with a mean value of 0. The covariance matrices Q and R of the system noise and observation noise are set as follows:

设定式(20)中观测噪声两个分量的方差则观测噪声协方差矩阵R即为单位矩阵。此外，误差协方差矩阵初始值可设定如下：Set the variance of the two components of the observation noise in equation (20) Then the observation noise covariance matrix R is the identity matrix. In addition, the initial value of the error covariance matrix can be set as follows:

因相邻帧中目标的运动可近似为匀速运动，为便于计算，可令帧间间隔△t＝1，则状态向量初始值可表示如下：Because the motion of the target in adjacent frames can be approximated as uniform motion, for the convenience of calculation, the inter-frame interval △t=1, then the initial value of the state vector can be expressed as follows:

x₀为卡尔曼滤波的初始状态，x(0)和y(0)表示为初始状态时目标的中心点坐标，x(k)-x(k-1)和y(k)-y(k-1)分别表示第K时刻与第K-1时刻x轴方向和y轴方向的位移。x ₀ is the initial state of the Kalman filter, x(0) and y(0) represent the center point coordinates of the target in the initial state, x(k)-x(k-1) and y(k)-y(k -1) represent the displacements in the x-axis direction and the y-axis direction at the Kth time and the K-1th time, respectively.

卡尔曼滤波跟踪是状态预测和状态修正的不断迭代过程中寻求一种动态平衡，当迭代次数达到设定值或者取得局部最优时，输出最终的结果，部分效果图如图4所示。Kalman filter tracking is to seek a dynamic balance in the continuous iterative process of state prediction and state correction. When the number of iterations reaches the set value or obtains a local optimum, the final result is output. Part of the effect diagram is shown in Figure 4.

步骤5：判断视频是否结束。假设待处理视频有N帧，当前帧号为i，通过判断当前帧号i与N的大小。如果i小于N，即视频没有结束，则转到步骤1，然后继续循环重复上述过程。如果i大于或者等于N，则表示视频已经被处理完毕，并结束程序。Step 5: Determine whether the video is over. Suppose there are N frames of the video to be processed, and the current frame number is i, by judging the size of the current frame number i and N. If i is less than N, that is, the video is not over, then go to step 1, and then continue to repeat the above process in a loop. If i is greater than or equal to N, it means that the video has been processed and the program ends.

Claims

1. The multi-target tracking method based on the optical flow method and Kalman filtering comprises the following steps:

(1) reading video frames, and calculating the optical flow of each frame of image by adopting an L-K optical flow method based on an image pyramid;

(2) clustering the optical flow to obtain a plurality of optical flow classes, and obtaining an optical flow detection effect graph marked with optical flow information;

(3) denoising the optical flow detection effect graph by adopting an improved median filtering method;

(4) the number of the optical flow types is the number of the moving targets, and the information of the moving targets is transmitted to Kalman filtering for target tracking processing.

2. The multi-target tracking method based on the optical flow method and the Kalman filtering as claimed in claim 1, characterized in that: the specific process of the L-K optical flow method based on the image pyramid comprises the following steps: calculating an optical flow on the image of the highest layer; taking the calculated result as an initial value of the image of the next layer, and calculating the optical flow of the layer on the basis of the initial value; this process is repeated until the last layer, the original image layer, is passed.

3. The multi-target tracking method based on the optical flow method and the Kalman filtering as claimed in claim 2, characterized in that: the optical flow calculation method comprises the following steps:

defining the following function on a local neighborhood centered on point a and minimizing the function value:

F (x, y) = \underset{(x, y) &Element; Ω}{Σ} W^{2} (x, y) [&dtri; I \cdot V_{a} + I_{t}]

where Ω represents the local neighborhood of point a, W (x, y) represents a weight function,representing the gradient, V, of the image at point a_aOptical flow representing point a, I_tThe gray value representing point a ═ x, y is the time-domain derivative of the (x, y, t) image at time t;

the optimization solution of the above equation can be obtained:

a column vector representing a gradient containing n points;

W＝diag[W(x₁),W(x₂),...,W(x_n)]representing a diagonal matrix containing weights of n points;

b＝-[I_t(x₁),I_t(x₂),...,I_t(x_n)]^Τa vector representing the time-domain derivative containing n points at time t;

V＝[A^ΤW²A]^-1A^ΤW²b represents the optical flow information obtained;

representing the gradient of the image at the nth point, W (x)_n) Representing the window weight, I, of the image at the nth point_t(x_n) Time-domain derivative of the gray value of the nth point of the image at time t, where x_i∈ Ω and i ═ 1,2,3.. n.

4. The multi-target tracking method based on the optical flow method and the Kalman filtering as claimed in claim 1, characterized in that: the clustering process includes: setting a threshold value D_thComparing the measurement functions D of the two optical flow vectors, merging the two optical flows if the measurement value is smaller than a threshold value to be used as an optical flow class, and calculating the average optical flow of the optical flow class; the metric functions of the remaining optical flow vectors and the average optical flow are then compared with a threshold value, and if the metric functions are smaller than the threshold value, the optical flow is incorporated into the optical flow class.

5. The method of claim 4The multi-target tracking method based on the optical flow method and Kalman filtering is characterized in that: also included in the same class of optical flow, if a certain point (x)_i,y_i) Distance d from the center of the natural flow class_iIf the value is more than or equal to (mu +2 sigma), deleting the point;

d_i＝(x_i-c_x)²+(y_i-c_y)²

n represents the total number of pixels in the same optical flow class.

6. The multi-target tracking method based on the optical flow method and the Kalman filtering according to claim 4 or 5, characterized in that: the metric function D is

D (u_{1}, v_{1}; u_{2}, v_{2}) = \frac{\sqrt{{(u_{1} - u_{2})}^{2} + {(v_{1} - v_{2})}^{2}}}{0.5 \cdot (\sqrt{u_{1}^{2} + v_{1}^{2}} + \sqrt{u_{2}^{2} + v_{2}^{2}})}

(u₁,v₁) And (u)₂,v₂) Representing two optical flow vector values, respectively.

7. The multi-target tracking method based on the optical flow method and the Kalman filtering as claimed in claim 1, characterized in that: the step (3) further comprises the following steps: converting the optical flow detection effect graph into a gray scale graph; denoising the gray level image through improved median filtering; converting the denoised image into a binary image; and acquiring a final moving target through the expansion processing and the morphological closing operation.

8. The multi-target tracking method based on the optical flow method and the Kalman filtering as recited in claim 7, characterized in that: the improved median filtering adopts a fast parallel median filtering method: respectively calculating the maximum value, the median value and the minimum value of each column or each row in the sliding window to obtain row number or column group data of the sliding window; the maximum value of the sliding window is the maximum value in the maximum value group, the minimum value of the sliding window is the minimum value in the minimum value group, and the median of the sliding window is the median of three values, namely the minimum value in the maximum value group, the median in the median group and the maximum value in the minimum value group.