CN117078718A

CN117078718A - Multi-target vehicle tracking method in expressway scene based on deep SORT

Info

Publication number: CN117078718A
Application number: CN202311020638.7A
Authority: CN
Inventors: 赵池航; 李旋; 覃晓明; 解兴鹏; 马欣怡
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2023-11-17

Abstract

The invention discloses a method for tracking multiple target vehicles in a expressway scene based on deep SORT, which comprises the following steps: firstly, constructing a data set for evaluating performance of a multi-target vehicle tracking model in an expressway scene; secondly, constructing a Faster-RCNN-VDHS model and a YOLOV5s-VDHS model for vehicle detection in the expressway scene; thirdly, motion prediction and state estimation, namely modeling and estimating the possible arrival position of a target in a future frame by counting the motion behaviors and parameters of the target object in the previous frame; then, constructing a DNFM-RDS model, a ResNet50-VRHS model, a DenseNet121-VRHS model and a SheffleNetV 2-VRHS model for vehicle re-identification, extracting motion characteristics and apparent characteristics of a target detection frame, and constructing a cost matrix according to the similarity between the characteristics; and finally, performing target vehicle association matching. The invention has the beneficial effects that: the intelligent high-speed system can effectively track the multi-target vehicles in the expressway scene, and improve the monitoring efficiency of the expressway system, so that the development of the intelligent high-speed system is promoted.

Description

Multi-target vehicle tracking method in highway scenes based on DeepSORT

技术领域Technical field

本发明属于智慧高速、智慧感知研究领域，具体涉及基于DeepSORT的高速公路场景中多目标车辆跟踪方法。The invention belongs to the research field of smart high-speed and smart perception, and specifically relates to a DeepSORT-based multi-target vehicle tracking method in a highway scene.

背景技术Background technique

人工智能的迅速发展为高速公路场景中的实时监控提供了新的解决方案和途径。如何利用机器学习理论、深度学习方法和计算机视觉技术，在高速公路监控视频的基础之上对高速公路中的行驶车辆进行精准检测与跟踪，实现高速公路车辆的自动化识别定位、流量统计及异常行为检测，为交通管理部门提供科学决策是智能交通系统发展的一项重要任务。The rapid development of artificial intelligence provides new solutions and approaches for real-time monitoring in highway scenarios. How to use machine learning theory, deep learning methods and computer vision technology to accurately detect and track vehicles traveling on highways based on highway surveillance videos, and achieve automated identification, positioning, traffic statistics and abnormal behavior of highway vehicles Detection and providing scientific decision-making for the traffic management department is an important task in the development of intelligent transportation systems.

由于高速公路露天场景特性，例如在白天、傍晚、黑夜等不同时间段内光线变化显著，监控污损情况严重，并且通行车辆行驶速度多在80公里每小时以上，视频图像易变模糊，导致高速公路场景中车辆检测与跟踪的准确率降低。本发明中基于DeepSORT的高速公路场景中多目标车辆跟踪技术可有效提高高速公路系统监控的效率，促进智慧高速系统的发展。Due to the characteristics of open-air scenes on highways, such as significant changes in light during different time periods such as day, evening, and night, serious monitoring contamination, and most passing vehicles traveling at speeds above 80 kilometers per hour, video images are prone to blur, resulting in high-speed The accuracy of vehicle detection and tracking in highway scenes is reduced. The multi-target vehicle tracking technology in the highway scene based on DeepSORT in the present invention can effectively improve the efficiency of highway system monitoring and promote the development of smart high-speed systems.

发明内容Contents of the invention

发明目的：为了克服现有技术中存在的不足，提供基于DeepSORT的高速公路场景中多目标车辆跟踪方法，构建用于公路沥青路面病害检测的FFRV-RV-DM、YV-RV-DM、FRV-DV-DM、YV-DV-DM、FRV-SV-DM、YV-SV-DM、FRV-DR-DM、YV-DR-DM网络模型，进过模型训练、参数优化与模型比选，能够有效对高速公路场景中多目标车辆进行跟踪，具有较好的MOTA、MOTP以及平均检测时间，可对高速公路系统监控提供技术支持。Purpose of the invention: In order to overcome the deficiencies in the existing technology, provide a multi-target vehicle tracking method in highway scenes based on DeepSORT, and construct FFRV-RV-DM, YV-RV-DM, FRV- for highway asphalt pavement disease detection. DV-DM, YV-DV-DM, FRV-SV-DM, YV-SV-DM, FRV-DR-DM, YV-DR-DM network models have undergone model training, parameter optimization and model comparison, and can effectively It tracks multi-target vehicles in highway scenes, has good MOTA, MOTP and average detection time, and can provide technical support for highway system monitoring.

技术方案：为实现上述目的，本发明提供基于DeepSORT的高速公路场景中多目标车辆跟踪方法，包括如下步骤：Technical solution: In order to achieve the above objectives, the present invention provides a multi-target vehicle tracking method in a highway scene based on DeepSORT, which includes the following steps:

S1：数据集的构建，构建用于评估高速公路场景中多目标车辆跟踪模型的性能的数据集，包括白天、傍晚、黑夜3个场景；S1: Construction of the data set. Construct a data set for evaluating the performance of the multi-target vehicle tracking model in highway scenes, including three scenes: daytime, evening, and night;

S2：目标检测，构建用于高速公路场景中车辆检测的Faster-RCNN-VDHS模型、YOLOV5s-VDHS模型，用于检测其中运动的目标并获取图像信息；S2: Target detection, construct the Faster-RCNN-VDHS model and YOLOV5s-VDHS model for vehicle detection in highway scenes, which are used to detect moving targets and obtain image information;

S3：运动预测与状态估计，通过统计目标对象在之前帧的运动行为和参数来建模估计目标将在未来帧可能到达的位置；S3: Motion prediction and state estimation. By counting the motion behavior and parameters of the target object in previous frames, we can model and estimate the possible position that the target will reach in future frames;

S4：特征提取与相似度计算，构建用于车辆重识别的DNFM-RDS模型、ResNet50-VRHS模型、DenseNet121-VRHS模型和ShuffleNetV2-VRHS模型，用于提取目标检测框的运动特征和表观特征，通过运动信息和外观信息添加距离计算，并根据特征之间的相似度构建代价矩阵；S4: Feature extraction and similarity calculation, construct the DNFM-RDS model, ResNet50-VRHS model, DenseNet121-VRHS model and ShuffleNetV2-VRHS model for vehicle re-identification, which are used to extract the motion features and appearance features of the target detection frame. Add distance calculation through motion information and appearance information, and build a cost matrix based on the similarity between features;

S5：目标车辆关联匹配，将目标关联问题转换为求解最优匹配的问题，对帧与帧之间各个目标进行最优匹配；S5: Target vehicle association matching, convert the target association problem into the problem of solving the optimal matching, and perform optimal matching of each target between frames;

S6：模型训练并优化参数，优选FFRV-RV-DM、YV-RV-DM、FRV-DV-DM、YV-DV-DM、FRV-SV-DM、YV-SV-DM、FRV-DR-DM、YV-DR-DM模型进行高速公路场景中多目标车辆跟踪。S6: Model training and optimization of parameters, preferably FFRV-RV-DM, YV-RV-DM, FRV-DV-DM, YV-DV-DM, FRV-SV-DM, YV-SV-DM, FRV-DR-DM , YV-DR-DM model performs multi-target vehicle tracking in highway scenes.

进一步的，所述步骤S1中采用京沪高速某路段的白天、傍晚、黑夜的场景监控视频，FPS为25，分辨率为1920×1080，采用轻量级视频标注工具darklabel 2.4，将上述视频流导入到标注工具中，按照MOT16 Challenge数据集的格式给车辆分配不同的ID，构建高速公路场景中车辆跟踪数据集。Further, in the step S1, the daytime, evening, and night scene surveillance videos of a certain section of the Beijing-Shanghai Expressway are used. The FPS is 25 and the resolution is 1920×1080. The lightweight video labeling tool darklabel 2.4 is used to convert the above video stream Import it into the annotation tool, assign different IDs to vehicles according to the format of the MOT16 Challenge data set, and construct a vehicle tracking data set in the highway scene.

进一步的，所述步骤S2中构建用于高速公路场景中车辆检测的Faster-RCNN-VDHS模型和YOLOV5s-VDHS模型，分析视频的输入帧，识别检测其中运动的目标，创建检测到的目标对应的轨迹，初始化下一阶段的运动变量。Further, in step S2, the Faster-RCNN-VDHS model and YOLOV5s-VDHS model for vehicle detection in highway scenes are constructed, the input frames of the video are analyzed, the moving targets are identified and detected, and the corresponding images of the detected targets are created. Trajectory, initialize the motion variables of the next stage.

进一步的，所述步骤S3中采用卡尔曼滤波作为运动模型，卡尔曼滤波根据第t-1帧中的车辆边界框进行建模后，可以得到第t帧中目标的预测框；然后，通过目标检测方法将第t帧的检测框作为观测值，结合预测框和检测框，得到当前视频第t帧所有目标检测框的最优估计值；最后，利用该最优估计值又得到第t+1帧中该目标新的预测值，通过不断迭代，最终逼近目标的实际值，采用一个8维空间来描述高速公路场景中车辆轨迹在某一时刻的状态，表示公式为：Further, Kalman filtering is used as the motion model in step S3. After Kalman filtering is modeled according to the vehicle bounding box in the t-1th frame, the prediction frame of the target in the tth frame can be obtained; then, through the target The detection method uses the detection frame of the t-th frame as the observation value, combines the prediction frame and the detection frame, and obtains the optimal estimate of all target detection frames in the t-th frame of the current video; finally, the optimal estimate is used to obtain the t+1th frame. The new predicted value of the target in the frame, through continuous iteration, finally approaches the actual value of the target. An 8-dimensional space is used to describe the state of the vehicle trajectory in the highway scene at a certain moment. The expression formula is:

其中，(u,v)为车辆检测框的中心坐标，r为纵横比，h为边框高度，(x·,y·,r·,h·)为(u,v,r,h)的速度信息。Among them, (u, v) is the center coordinate of the vehicle detection frame, r is the aspect ratio, h is the height of the frame, (x·,y·,r·,h·) is the speed of (u,v,r,h) information.

进一步的，所述步骤S4中包括目标检测框的运动特征和表观特征的获取，基于车辆位置运动特征的马氏距离可以提供对短期预测比较有效的信息，计算公式如下：Furthermore, the step S4 includes the acquisition of motion features and apparent features of the target detection frame. The Mahalanobis distance based on the motion features of the vehicle position can provide information that is more effective for short-term prediction. The calculation formula is as follows:

其中，d⁽¹⁾(i,j)为第j个检测框与第i条轨迹的匹配程度，d_j为第j个检测框的位置坐标信息，y_j为当前帧轨迹对下一帧的预测，P_i为卡尔曼滤波预测的当前时刻观测空间的协方差矩阵。Among them, d ⁽¹⁾ (i, j) is the matching degree between the j-th detection frame and the i-th trajectory, d _j is the position coordinate information of the j-th detection frame, and y _j is the relationship between the current frame trajectory and the next frame. Prediction, _Pi is the covariance matrix of the observation space at the current moment predicted by Kalman filter.

进一步的，所述步骤S4中包括目标检测框的运动特征和表观特征的获取，基于车辆表观信息特征的余弦距离在目标长期被遮挡后恢复目标ID方面发挥重要作用，计算公式如下：Furthermore, step S4 includes the acquisition of motion features and apparent features of the target detection frame. The cosine distance based on the apparent information features of the vehicle plays an important role in recovering the target ID after the target has been occluded for a long time. The calculation formula is as follows:

其中，r_j限制条件为||r_j||＝1，R_i为存储整个跟踪轨迹的特征向量库，为跟踪轨迹的特征向量库的其中一个特征向量。Among them, the restriction condition of r _j is ||r _j ||=1, R _i is the feature vector library that stores the entire tracking trajectory, is one of the feature vectors in the feature vector library of the tracking trajectory.

进一步的，所述步骤S5中目标车辆关联匹配包括匈牙利算法、级联匹配、IOU匹配，将目标关联问题转换为求解最优匹配的问题，对帧与帧之间各个目标进行最优匹配。Further, the target vehicle correlation matching in step S5 includes the Hungarian algorithm, cascade matching, and IOU matching. The target correlation problem is converted into a problem of optimal matching, and each target between frames is optimally matched.

进一步的，所述步骤S6中构建ResNet50-VRHS、DenseNet121-VRHS、ShuffleNetV2-VRHS、DNFM-RDS作为重识别网络进行特征提取与相似度计算，联合YOLOV5s-VDHS、Faster-RCNN-VDHS模型，包括FFRV-RV-DM、YV-RV-DM、FRV-DV-DM、YV-DV-DM、FRV-SV-DM、YV-SV-DM、FRV-DR-DM、YV-DR-DM8个模型，构建基于深度学习的高速公路场景中车辆跟踪模型。Further, in step S6, ResNet50-VRHS, DenseNet121-VRHS, ShuffleNetV2-VRHS, and DNFM-RDS are constructed as re-identification networks for feature extraction and similarity calculation, combined with YOLOV5s-VDHS and Faster-RCNN-VDHS models, including FFRV -RV-DM, YV-RV-DM, FRV-DV-DM, YV-DV-DM, FRV-SV-DM, YV-SV-DM, FRV-DR-DM, YV-DR-DM8 models, built Vehicle tracking model in highway scenes based on deep learning.

进一步的，所述步骤S6中基于构建的高速公路场景中车辆跟踪数据集，通过损失函数曲线提供验证集数据反馈，依次调节模型迭代次数、学习率、衰减权重等超参数，分别得到三个网络模型的MOTA、MOTP和平均检测时间，综合对比选取最优高速公路场景中多目标车辆跟踪方法。Further, in step S6, based on the constructed vehicle tracking data set in the highway scene, the verification set data feedback is provided through the loss function curve, and hyperparameters such as the number of model iterations, learning rate, and attenuation weight are adjusted in sequence to obtain three networks respectively. The MOTA, MOTP and average detection time of the model are comprehensively compared to select the optimal multi-target vehicle tracking method in the highway scene.

进一步，所述MOTA、MOTP具体指：MOTA为多目标跟踪准确度，是衡量多目标跟踪方法对车辆轨迹持续跟踪能力的重要指标，其值越接近1，跟踪性能越佳，计算公式如下：Furthermore, the MOTA and MOTP specifically refer to: MOTA is the multi-target tracking accuracy, which is an important indicator to measure the ability of the multi-target tracking method to continuously track the vehicle trajectory. The closer its value is to 1, the better the tracking performance. The calculation formula is as follows:

其中，FN_t为第t帧中漏检的目标车辆数目、FP_t为第t帧中误检的目标车辆数目、IDSW_t为第t帧中若干条车辆轨迹上目标车辆ID发生跳变的数目、GT_t为第t帧中标注的实际目标车辆数目。Among them, FN _t is the number of missed target vehicles in the t-th frame, FP _t is the number of falsely-detected target vehicles in the t-th frame, and IDSW _t is the number of target vehicle ID jumps on several vehicle trajectories in the t-th frame. , GT _t is the actual number of target vehicles marked in the t-th frame.

MOTP为多目标跟踪精度，是多目标车辆追踪方法衡量检测器定位精度的重要指标，表示多目标车辆追踪算法的检测框与人工标注的GT之间的平均度量，计算公式如下：MOTP stands for multi-target tracking accuracy. It is an important indicator for measuring the positioning accuracy of the detector in the multi-target vehicle tracking method. It represents the average measurement between the detection frame of the multi-target vehicle tracking algorithm and the manually labeled GT. The calculation formula is as follows:

其中，c_t为第t帧中预测目标框与GT成功匹配的目标框数目，为第t帧中第i个目标车辆与真实标注框之间的交并比。MOTP数值越大，表示其检测器定位性能越好。Among them, c _t is the number of target frames that successfully match the predicted target frame and GT in the t-th frame, is the intersection ratio between the i-th target vehicle and the real annotation box in the t-th frame. The larger the MOTP value, the better the detector positioning performance.

本发明综合了ResNet50-VRHS、DenseNet121-VRHS、ShuffleNetV2-VRHS、DNFM-RDS特征提取与相似度计算的重识别模型，与YOLOV5s-VDHS、Faster-RCNN-VDHS检测模型，构建用于高速公路场景中多目标车辆跟踪模型。通过模型训练与参数优化，发现在对高速公路场景中车辆持续跟踪的能力上，以YOLOV5s-VDHS为检测器及DeepSORT-MVTHS框架的高速公路场景中车辆跟踪模型的性能优于以Faster-RCNN-VDHS为检测器及DeepSORT-MVTHS框架的跟踪模型；在对高速公路场景中车辆的定位精度上，以Faster-RCNN-VDHS为检测器及DeepSORT-MVTHS框架的高速公路场景车辆跟踪模型的性能优于以YOLOV5s-VDHS为检测器及DeepSORT-MVTHS框架的跟踪模型。This invention integrates the re-identification model of ResNet50-VRHS, DenseNet121-VRHS, ShuffleNetV2-VRHS, DNFM-RDS feature extraction and similarity calculation, and the YOLOV5s-VDHS, Faster-RCNN-VDHS detection model, and is constructed for use in highway scenes. Multi-objective vehicle tracking model. Through model training and parameter optimization, it was found that in terms of the ability to continuously track vehicles in highway scenes, the performance of the vehicle tracking model in highway scenes using YOLOV5s-VDHS as the detector and the DeepSORT-MVTHS framework is better than that of Faster-RCNN- VDHS is the detector and the tracking model of the DeepSORT-MVTHS framework; in terms of positioning accuracy of vehicles in the highway scene, the performance of the highway scene vehicle tracking model using Faster-RCNN-VDHS as the detector and the DeepSORT-MVTHS framework is better than Tracking model using YOLOV5s-VDHS as detector and DeepSORT-MVTHS framework.

本发明的有益效果是：构建的基于DeepSORT的高速公路场景中多目标车辆跟踪方法，进过模型训练、参数优化与模型比选，能够有效对高速公路场景中多目标车辆进行跟踪，具有较好的MOTA、MOTP以及平均检测时间，可对高速公路系统监控提供技术支持。The beneficial effects of the present invention are: the constructed multi-target vehicle tracking method in the highway scene based on DeepSORT can effectively track the multi-target vehicles in the highway scene through model training, parameter optimization and model comparison, and has better performance. MOTA, MOTP and average detection time can provide technical support for highway system monitoring.

附图说明Description of drawings

图1为多目标跟踪流程图。Figure 1 is the flow chart of multi-target tracking.

图2为高速公路场景中车辆跟踪方法监控视频数据集标注样例。Figure 2 shows an example of monitoring video data set annotation using the vehicle tracking method in a highway scene.

图3为DeepSORT-MVTHS目标车辆关联匹配流程图。Figure 3 is the DeepSORT-MVTHS target vehicle association matching flow chart.

图4为YV-RV-DM模型结构图。Figure 4 is the structure diagram of the YV-RV-DM model.

图5为FRV-SV-DM模型结构图。Figure 5 is the structure diagram of the FRV-SV-DM model.

图6为高速公路场景中车辆跟踪模型评价指标对比图。Figure 6 is a comparison chart of vehicle tracking model evaluation indicators in highway scenes.

图7为高速公路场景中车辆跟踪模型图像处理时间对比图。Figure 7 is a comparison chart of the image processing time of the vehicle tracking model in the highway scene.

具体实施方式Detailed ways

下面结合附图和具体实施例，进一步阐明本发明。The present invention will be further elucidated below in conjunction with the accompanying drawings and specific embodiments.

本发明提供基于DeepSORT的高速公路场景中多目标车辆跟踪方法，具体包括以下步骤：The present invention provides a multi-target vehicle tracking method in a highway scene based on DeepSORT, which specifically includes the following steps:

基于检测的跟踪方法的整体流程可划分为目标检测阶段、运动预测阶段、特征提取与相似度计算阶段、数据关联阶段，如图1所示。The overall process of the detection-based tracking method can be divided into the target detection stage, motion prediction stage, feature extraction and similarity calculation stage, and data association stage, as shown in Figure 1.

S1：数据集的构建，构建用于评估高速公路场景中多目标车辆跟踪模型的性能的数据集，包括白天、傍晚、黑夜3个场景。中采用京沪高速某路段的白天、傍晚、黑夜的场景监控视频，FPS为25，分辨率为1920×1080，采用轻量级视频标注工具darklabel 2.4，将上述视频流导入到标注工具中，按照MOT16 Challenge数据集的格式给车辆分配不同的ID，构建高速公路场景中车辆跟踪数据集，如图2所示，包括视频帧序列号、车辆ID号、标注目标框的左上角顶点的x位置坐标、y位置坐标、目标框宽度、目标框高度；S1: Construction of the data set. Construct a data set for evaluating the performance of the multi-objective vehicle tracking model in highway scenes, including three scenes: day, evening, and night. The daytime, evening, and night scene surveillance video of a section of the Beijing-Shanghai Expressway is used. The FPS is 25 and the resolution is 1920×1080. The lightweight video annotation tool darklabel 2.4 is used to import the above video stream into the annotation tool. According to The format of the MOT16 Challenge data set assigns different IDs to vehicles to construct a vehicle tracking data set in highway scenes, as shown in Figure 2, including the video frame sequence number, vehicle ID number, and the x position coordinate of the upper left corner vertex of the labeled target frame. , y position coordinate, target box width, target box height;

S2：构建用于高速公路场景中车辆检测的Faster-RCNN-VDHS模型和YOLOV5s-VDHS模型，分析视频的输入帧，识别检测其中运动的目标，创建检测到的目标对应的轨迹，初始化下一阶段的运动变量；S2: Construct the Faster-RCNN-VDHS model and YOLOV5s-VDHS model for vehicle detection in highway scenes, analyze the input frames of the video, identify and detect moving targets, create trajectories corresponding to the detected targets, and initialize the next stage motion variables;

S3：采用卡尔曼滤波作为运动模型，卡尔曼滤波根据第t-1帧中的车辆边界框进行建模后，可以得到第t帧中目标的预测框；然后，通过目标检测方法将第t帧的检测框作为观测值，结合预测框和检测框，得到当前视频第t帧所有目标检测框的最优估计值。最后，利用该最优估计值又得到第t+1帧中该目标新的预测值，通过不断迭代，最终逼近目标的实际值，采用一个8维空间来描述高速公路场景中车辆轨迹在某一时刻的状态，表示公式为：S3: Kalman filter is used as the motion model. After Kalman filter is modeled according to the vehicle bounding box in the t-1th frame, the prediction frame of the target in the tth frame can be obtained; then, the tth frame is calculated using the target detection method. The detection frame is used as the observation value, and the prediction frame and the detection frame are combined to obtain the optimal estimate of all target detection frames in the tth frame of the current video. Finally, the optimal estimated value is used to obtain a new predicted value of the target in frame t+1. Through continuous iteration, the actual value of the target is finally approximated. An 8-dimensional space is used to describe the vehicle trajectory in the highway scene at a certain point. The state at the moment is represented by the formula:

其中，(u,v)为车辆检测框的中心坐标，r为纵横比，h为边框高度，(x·,y·,r·,h·)为(u,v,r,h)的速度信息；Among them, (u, v) is the center coordinate of the vehicle detection frame, r is the aspect ratio, h is the height of the frame, (x·,y·,r·,h·) is the speed of (u,v,r,h) information;

状态向量的预测方程计算公式如下所示，描述了高速公路场景中车辆运动状态从t-1帧到第t帧的转移过程，F为状态转移矩阵，在车辆运动状态转移过程中，还需要对车辆运动状态的不确定性做描述。卡尔曼滤波假设系统中的状态变量都服从高斯分布，因此可以用均值和方差来描述每个变量。The prediction equation calculation formula of the state vector is as follows, which describes the transfer process of the vehicle motion state from the t-1 frame to the t-th frame in the highway scene. F is the state transition matrix. During the vehicle motion state transfer process, it is also necessary to The uncertainty of the vehicle's motion state is described. Kalman filter assumes that the state variables in the system all obey Gaussian distribution, so each variable can be described by mean and variance.

X＝FX_t-1 X＝FX _t-1

下式对车辆运动状态以及车辆运动状态的不确定性进行了初步估计，P为协方差矩阵，用来描述车辆运动状态的不确定性，P中的值越大，表示运动状态的不确定性越高，Q为系统噪声矩阵，用来描述车辆运动状态预测中的噪声。The following formula makes a preliminary estimate of the vehicle motion state and the uncertainty of the vehicle motion state. P is the covariance matrix, which is used to describe the uncertainty of the vehicle motion state. The larger the value in P, the greater the uncertainty of the vehicle motion state. The higher, Q is the system noise matrix, which is used to describe the noise in vehicle motion state prediction.

P＝FP_t-1F_T+QP＝FP _t-1 F _T +Q

S4：包括目标检测框的运动特征和表观特征的获取，基于车辆位置运动特征的马氏距离可以提供对短期预测比较有效的信息，计算公式如下：S4: Including the acquisition of motion features and appearance features of the target detection frame. The Mahalanobis distance based on the motion features of the vehicle position can provide more effective information for short-term prediction. The calculation formula is as follows:

基于车辆表观信息特征的余弦距离在目标长期被遮挡后恢复目标ID方面发挥重要作用，计算公式如下：The cosine distance based on the apparent information characteristics of the vehicle plays an important role in recovering the target ID after the target has been occluded for a long time. The calculation formula is as follows:

当完成高速中路场景中目标检测车辆与轨迹的匹配度计算后，需要通过阈值矩阵进行门控计算，才能计算代价矩阵完成后续的而数据关联。阈值矩阵的计算方法，如下式所示。After completing the calculation of the matching degree between the target detection vehicle and the trajectory in the high-speed middle road scene, the gate control calculation needs to be performed through the threshold matrix before the cost matrix can be calculated to complete the subsequent data association. The calculation method of the threshold matrix is as shown in the following formula.

其中，为为马氏距离的指示器，用于比较马氏距离与卡方分布阈值，只有当马氏距离小于该阈值t⁽¹⁾表示匹配成功；/>为余弦距离的指示器，只有余弦距离小于预设的超参数t⁽²⁾时表示匹配成功；/>为联合指示器，只有其值为1时才被认定初步匹配成功。in, It is an indicator of the Mahalanobis distance, used to compare the Mahalanobis distance with the chi-square distribution threshold. Only when the Mahalanobis distance is less than the threshold t ⁽¹⁾ , the matching is successful;/> It is an indicator of cosine distance. Only when the cosine distance is less than the preset hyperparameter t ^(2), the matching is successful;/> It is a joint indicator, and only when its value is 1 is the initial match considered successful.

S5：目标车辆关联匹配包括匈牙利算法、级联匹配、IOU匹配，将目标关联问题转换为求解最优匹配的问题，对帧与帧之间各个目标进行最优匹配，如图3所示。首先，对候选图像进行目标检测获得候选车辆检测框，将卡尔曼滤波预测为确定态的跟踪轨迹与检测到的目标框进行级联匹配，从而得到的结果分为三种情况：成功匹配的轨迹MT(MatchedTracks，MT)、未匹配的检测UT(Unmatched Tracks，UT)、未匹配的轨迹UD(UnmatchedDetections，UD)。在成功匹配的情况下，即预测边界框与当前帧的车辆检测边界框匹配，则只需要采用卡尔曼滤波完成轨迹更新。对于未匹配的检测和未匹配的轨迹的情况，需要计算交并比进行IOU匹配。卡尔曼滤波预测为非确定态的跟踪轨迹与检测到的目标框则直接进行IOU匹配，并计算其代价矩阵以作为匈牙利算法的输入实现匹配关联。经过IOU匹配后，对于成功匹配的情况，采用卡尔曼滤波器对之前的轨迹进行更新；对于未匹配轨迹的情况，如果此时该轨迹处于未确认状态，则直接删除该轨迹，否则需要失配达到一定次数后进入删除态；对于未匹配的检测的情况，即在当前帧中出现了新的目标，但预测边界框中不存在该新增目标，这种情况则需要为该目标初始化一个新的轨迹。S5: Target vehicle correlation matching includes Hungarian algorithm, cascade matching, and IOU matching. The target correlation problem is converted into the problem of solving the optimal matching, and the optimal matching of each target between frames is performed, as shown in Figure 3. First, target detection is performed on the candidate image to obtain the candidate vehicle detection frame, and the tracking trajectory predicted as a definite state by the Kalman filter is cascade matched with the detected target frame. The results obtained are divided into three situations: successfully matched trajectory MT (MatchedTracks, MT), unmatched detection UT (Unmatched Tracks, UT), unmatched tracks UD (UnmatchedDetections, UD). In the case of a successful match, that is, the predicted bounding box matches the vehicle detection bounding box of the current frame, only Kalman filtering is needed to complete the trajectory update. For the case of unmatched detections and unmatched trajectories, the intersection-union ratio needs to be calculated for IOU matching. The tracking trajectory predicted by the Kalman filter to be in an uncertain state and the detected target frame are directly matched by IOU, and its cost matrix is calculated as the input of the Hungarian algorithm to achieve matching association. After IOU matching, for successful matching, the Kalman filter is used to update the previous trajectory; for unmatched trajectories, if the trajectory is in an unconfirmed state at this time, the trajectory will be deleted directly, otherwise mismatching is required It enters the deletion state after reaching a certain number of times; in the case of unmatched detection, that is, a new target appears in the current frame, but the new target does not exist in the predicted bounding box. In this case, a new target needs to be initialized for the target. traces of.

S6-1：构建ResNet50-VRHS、DenseNet121-VRHS、ShuffleNetV2-VRHS、DNFM-RDS作为重识别网络进行特征提取与相似度计算，联合YOLOV5s-VDHS、Faster-RCNN-VDHS模型，包括FFRV-RV-DM、YV-RV-DM、FRV-DV-DM、YV-DV-DM、FRV-SV-DM、YV-SV-DM、FRV-DR-DM、YV-DR-DM8个模型，构建基于深度学习的高速公路场景中车辆跟踪模型。S6-1: Construct ResNet50-VRHS, DenseNet121-VRHS, ShuffleNetV2-VRHS, and DNFM-RDS as re-identification networks for feature extraction and similarity calculation, combined with YOLOV5s-VDHS and Faster-RCNN-VDHS models, including FFRV-RV-DM , YV-RV-DM, FRV-DV-DM, YV-DV-DM, FRV-SV-DM, YV-SV-DM, FRV-DR-DM, YV-DR-DM8 models to build a deep learning-based Vehicle tracking model in highway scenes.

以YV-RV-DM、FRV-SV-DM模型为例，构建的YV-RV-DM模型结构如图4所示，首先，将高速公路监控视频帧序列依次经过YOLOV5s-VDHS模型中Focus模块、CBL标准卷积模块、CSP1_X模块、SPP空间金字塔池化模块、非极大抑制模块、NMS模块处理后，定位高速公路监控目标车辆的位置；其次，采用卡尔曼滤波根据车辆轨迹预测车辆在当前帧的8维状态向量，同时，车辆检测框经过ResNet50-VRHS模型中的Conv1_x、Conv2_x、Conv3_x、Conv4_x、Conv5_x模块及7×7的平均池化输出全连接层前的1×1×2048维的车辆外观特征向量，并计算其与每条车辆轨迹最近100个成功关联的特征集的最小余弦距离，从而建立代价矩阵；构建的FRV-SV-DM模型结构如图5所示，将高速公路监控视频帧输入Faster-RCNN-VDHS网络，经VGG-16中13个卷积层、4个最大池化层、3个全连接层处理后输出1×1×1000的全局特征，并通过RPN网络、ROI池化与分类模块定位高速公路场景中目标车辆的位置；其次，采用卡尔曼滤波根据车辆轨迹预测车辆在当前帧的8维状态向量，同时，车辆检测框经过ShuffleNetV2-VRHS模型中的卷积层、最池化层、Stage2模块、Stage3模块、Stage4模块输出全连接层前的1×1×1024维的车辆外观特征向量，并计算其与每条车辆轨迹最近100个成功关联的特征集的最小余弦距离，从而建立代价矩阵Taking the YV-RV-DM and FRV-SV-DM models as examples, the structure of the constructed YV-RV-DM model is shown in Figure 4. First, the highway surveillance video frame sequence is sequentially passed through the Focus module in the YOLOV5s-VDHS model, After processing by the CBL standard convolution module, CSP1_X module, SPP spatial pyramid pooling module, non-maximum suppression module, and NMS module, the position of the highway monitoring target vehicle is located; secondly, Kalman filtering is used to predict the position of the vehicle in the current frame based on the vehicle trajectory. 8-dimensional state vector. At the same time, the vehicle detection frame passes through the Conv1_x, Conv2_x, Conv3_x, Conv4_x, Conv5_x modules and 7×7 average pooling in the ResNet50-VRHS model to output a 1×1×2048-dimensional vehicle before the fully connected layer. Appearance feature vector, and calculate the minimum cosine distance between it and the last 100 successfully associated feature sets of each vehicle trajectory, thereby establishing a cost matrix; the constructed FRV-SV-DM model structure is shown in Figure 5, which combines the highway surveillance video The frame is input to the Faster-RCNN-VDHS network. After being processed by 13 convolutional layers, 4 max pooling layers, and 3 fully connected layers in VGG-16, a 1×1×1000 global feature is output, and passed through the RPN network, ROI The pooling and classification module locates the position of the target vehicle in the highway scene; secondly, Kalman filtering is used to predict the 8-dimensional state vector of the vehicle in the current frame based on the vehicle trajectory. At the same time, the vehicle detection frame passes through the convolution layer in the ShuffleNetV2-VRHS model , the most pooling layer, Stage2 module, Stage3 module, and Stage4 module output the 1×1×1024-dimensional vehicle appearance feature vector before the fully connected layer, and calculate the minimum of the 100 most recent successfully associated feature sets for each vehicle trajectory. cosine distance to create a cost matrix

S6-2：基于构建的高速公路场景中车辆跟踪数据集，通过损失函数曲线提供验证集数据反馈，依次调节模型迭代次数、学习率、衰减权重等超参数，分别得到三个网络模型的MOTA、MOTP和平均检测时间，如表1所示。所述MOTA、MOTP具体指：MOTA为多目标跟踪准确度，是衡量多目标跟踪方法对车辆轨迹持续跟踪能力的重要指标，其值越接近1，跟踪性能越佳，计算公式如下：S6-2: Based on the constructed vehicle tracking data set in the highway scene, the verification set data feedback is provided through the loss function curve, and hyperparameters such as the number of model iterations, learning rate, and attenuation weight are adjusted in sequence to obtain the MOTA, MOTP and average detection time are shown in Table 1. The MOTA and MOTP specifically refer to: MOTA is the multi-target tracking accuracy, which is an important indicator to measure the ability of the multi-target tracking method to continuously track the vehicle trajectory. The closer its value is to 1, the better the tracking performance. The calculation formula is as follows:

表1八种模型评价指标表Table 1 Eight model evaluation index tables

如图6所示，其中RV-RV-DM、YV-RV-DM、FRV-DV-DM、YV-DV-DM、FRV-SV-DM、YV-SV-DM、FRV-DR-DM、YV-DR-DM八种模型在高速公路场景监控视频数据集上的车辆跟踪中都具有良好效果，其MOTA、MOTP指标均达到80％左右。以YOLOV5s-VDHS为检测器、以ResNet50-VRHS为重识别网络的YV-RV-DM模型的MOTA为82.1％，MOTP为80.5％；以Faster-RCNN-VDHS为检测器、以ResNet50-VRHS为重识别网络的FRV-RV-DM模型的MOTA为79.7％，MOTP为85.3％；以YOLOV5s-VDHS为检测器、以DenseNet121-VRHS为重识别网络的YV-DV-DM模型的MOTA为82.6％，MOTP为80.8％；以Faster-RCNN-VDHS检测器、以DenseNet121-VRHS为重识别网络的FRV-DV-DM模型的MOTA为80.5％，MOTP为85.4％；以YOLOV5s-VDHS为检测器、以ShuffleNetV2-VRHS为重识别网络的YV-SV-DM模型的MOTA为81.9％，MOTP为80.3％；以Faster-RCNN-VDHS为检测器、以ShuffleNetV2-VRHS为重识别网络的FRV-SV-DM模型的MOTA为79.4％，MOTP为85.0％；以YOLOV5s-VDHS为检测器、以DNFM-RDS为重识别网络的YV-DR-DM模型的MOTA为83.2％，MOTP为81.1％；以Faster-RCNN-VDHS检测器、以DNFM-RDS为重识别网络的FRV-DR-DM模型的MOTA为81.0％，MOTP为85.8％。在对车辆持续跟踪的能力上，若使用相同的重识别网络，以YOLOV5s-VDHS为检测器及DeepSORT-MVTHS框架的高速公路场景中车辆跟踪模型的MOTA指标均高于以Faster-RCNN-VDHS为检测器的DeepSORT-MVTHS框架的跟踪模型；在定位精度上，以Faster-RCNN-VDHS为检测器的DeepSORT-MVTHS框架的高速公路场景车辆跟踪模型的MOTP指标均高于以YOLOV5s-VDHS为检测器及DeepSORT-MVTHS框架的跟踪模型FRV-RV-DM、YV-RV-DM、FRV-DV-DM、YV-DV-DM、FRV-SV-DM、YV-SV-DM、FRV-DR-DM、YV-DR-DM八种模型每帧图像平均处理时间如图7所示。YV-RV-DM模型每帧图像的平均处理时间为27ms；FRV-RV-DM模型每帧图像的平均处理时间为55ms；YV-DV-DM模型每帧图像的平均处理时间为31ms；FRV-DV-DM模型每帧图像的平均处理时间为62ms；YV-SV-DM模型每帧图像的平均处理时间为22ms；FRV-SV-DM模型每帧图像的平均处理时间为50ms；YV-DR-DM模型每帧图像的平均处理时间为81ms；FRV-DR-DM模型每帧图像的平均处理时间为108ms。As shown in Figure 6, RV-RV-DM, YV-RV-DM, FRV-DV-DM, YV-DV-DM, FRV-SV-DM, YV-SV-DM, FRV-DR-DM, YV -The eight DR-DM models all have good results in vehicle tracking on the highway scene surveillance video data set, and their MOTA and MOTP indicators both reach about 80%. The YV-RV-DM model using YOLOV5s-VDHS as the detector and ResNet50-VRHS as the re-identification network has a MOTA of 82.1% and a MOTP of 80.5%; using Faster-RCNN-VDHS as the detector and ResNet50-VRHS as the re-identification network, the MOTA is 82.1% and the MOTP is 80.5%. The FRV-RV-DM model of the recognition network has a MOTA of 79.7% and a MOTP of 85.3%; the YV-DV-DM model using YOLOV5s-VDHS as the detector and DenseNet121-VRHS as the re-identification network has a MOTA of 82.6% and a MOTP of 82.6%. is 80.8%; the MOTA of the FRV-DV-DM model using Faster-RCNN-VDHS detector and DenseNet121-VRHS as the re-identification network is 80.5% and MOTP is 85.4%; using YOLOV5s-VDHS as the detector and ShuffleNetV2- The MOTA of the YV-SV-DM model with VRHS as the re-identification network is 81.9%, and the MOTP is 80.3%; the MOTA of the FRV-SV-DM model with Faster-RCNN-VDHS as the detector and ShuffleNetV2-VRHS as the re-identification network is 79.4%, and the MOTP is 85.0%; the YV-DR-DM model using YOLOV5s-VDHS as the detector and DNFM-RDS as the re-identification network has a MOTA of 83.2% and a MOTP of 81.1%; using Faster-RCNN-VDHS for detection The MOTA of the FRV-DR-DM model using DNFM-RDS as the re-identification network is 81.0%, and the MOTP is 85.8%. In terms of the ability to continuously track vehicles, if the same re-identification network is used, the MOTA indicators of the vehicle tracking model in the highway scene using YOLOV5s-VDHS as the detector and the DeepSORT-MVTHS framework are both higher than those using Faster-RCNN-VDHS. The tracking model of the DeepSORT-MVTHS framework of the detector; in terms of positioning accuracy, the MOTP indicators of the highway scene vehicle tracking model of the DeepSORT-MVTHS framework with Faster-RCNN-VDHS as the detector are higher than those with YOLOV5s-VDHS as the detector. And the tracking models of the DeepSORT-MVTHS framework FRV-RV-DM, YV-RV-DM, FRV-DV-DM, YV-DV-DM, FRV-SV-DM, YV-SV-DM, FRV-DR-DM, The average processing time of each frame of the eight models of YV-DR-DM is shown in Figure 7. The average processing time of each frame of the YV-RV-DM model is 27ms; the average processing time of each frame of the FRV-RV-DM model is 55ms; the average processing time of each frame of the YV-DV-DM model is 31ms; FRV- The average processing time of each frame of the DV-DM model is 62ms; the average processing time of each frame of the YV-SV-DM model is 22ms; the average processing time of each frame of the FRV-SV-DM model is 50ms; YV-DR- The average processing time of each frame of the DM model is 81ms; the average processing time of each frame of the FRV-DR-DM model is 108ms.

本发明基于DeepSORT的高速公路场景中多目标车辆跟踪方法。构建了用于高速公路场景中多目标车辆跟踪框架DeepSORT-MVTHS以及用于高速公路场景中车辆跟踪的FRV-RV-DM、YV-RV-DM、FRV-DV-DM、YV-DV-DM、FRV-SV-DM、YV-SV-DM、FRV-DR-DM、YV-DR-DM8种模型，基于构建的高速公路场景中车辆视频监控数据集进行对比实验，实验结果表明，在对高速公路场景中车辆持续跟踪的能力上，以YOLOV5s-VDHS为检测器及DeepSORT-MVTHS框架的高速公路场景中车辆跟踪模型的性能优于以Faster-RCNN-VDHS为检测器及DeepSORT-MVTHS框架的跟踪模型，在对高速公路场景中车辆的定位精度上，以Faster-RCNN-VDHS为检测器及DeepSORT-MVTHS框架的高速公路场景车辆跟踪模型的性能优于以YOLOV5s-VDHS为检测器及DeepSORT-MVTHS框架的跟踪模型。The present invention is a multi-target vehicle tracking method in a highway scene based on DeepSORT. DeepSORT-MVTHS, a multi-target vehicle tracking framework for highway scenes and FRV-RV-DM, YV-RV-DM, FRV-DV-DM, YV-DV-DM, for vehicle tracking in highway scenes, were constructed. FRV-SV-DM, YV-SV-DM, FRV-DR-DM, YV-DR-DM eight models were compared and tested based on the constructed vehicle video surveillance data set in the highway scene. The experimental results show that in the highway scene In terms of the ability to continuously track vehicles in the scene, the performance of the vehicle tracking model in the highway scene using YOLOV5s-VDHS as the detector and the DeepSORT-MVTHS framework is better than that of the tracking model using Faster-RCNN-VDHS as the detector and the DeepSORT-MVTHS framework. , In terms of positioning accuracy of vehicles in highway scenes, the performance of the highway scene vehicle tracking model using Faster-RCNN-VDHS as the detector and the DeepSORT-MVTHS framework is better than using YOLOV5s-VDHS as the detector and the DeepSORT-MVTHS framework. tracking model.

本发明方案所公开的技术手段不仅限于上述实施方式所公开的技术手段，还包括由以上技术特征任意组合所组成的技术方案。应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也视为本发明的保护范围。The technical means disclosed in the solution of the present invention are not limited to the technical means disclosed in the above embodiments, but also include technical solutions composed of any combination of the above technical features. It should be pointed out that for those of ordinary skill in the art, several improvements and modifications can be made without departing from the principles of the present invention, and these improvements and modifications are also regarded as the protection scope of the present invention.

Claims

1. The multi-target vehicle tracking method in highway scenes based on DeepSORT is characterized by: including the following steps:

S1: Construction of the data set. Construct a data set for evaluating the performance of the multi-target vehicle tracking model in highway scenes, including three scenes: daytime, evening, and night;

S2: Target detection, construct the Faster-RCNN-VDHS model and YOLOV5s-VDHS model for vehicle detection in highway scenes, which are used to detect moving targets and obtain image information;

S3: Motion prediction and state estimation. By counting the motion behavior and parameters of the target object in previous frames, we can model and estimate the possible position that the target will reach in future frames;

S4: Feature extraction and similarity calculation, construct the DNFM-RDS model, ResNet50-VRHS model, DenseNet121-VRHS model and ShuffleNetV2-VRHS model for vehicle re-identification, which are used to extract the motion features and appearance features of the target detection frame. Add distance calculation through motion information and appearance information, and build a cost matrix based on the similarity between features;

S5: Target vehicle association matching, convert the target association problem into the problem of solving the optimal matching, and perform optimal matching of each target between frames;

S6: Model training and optimization of parameters, preferably FFRV-RV-DM, YV-RV-DM, FRV-DV-DM, YV-DV-DM, FRV-SV-DM, YV-SV-DM, FRV-DR-DM , YV-DR-DM model performs multi-target vehicle tracking in highway scenes.

2. The DeepSORT-based multi-target vehicle tracking method in a highway scene according to claim 1, characterized in that: in step S1, scene monitoring videos of a certain section of a highway during the day, evening and night are used, and the FPS is 25 , the resolution is 1920×1080, using the lightweight video annotation tool darklabel 2.4, import the above video stream into the annotation tool, assign different IDs to the vehicles according to the format of the MOT16 Challenge data set, and construct vehicle tracking data in the highway scene set.

3. The DeepSORT-based multi-target vehicle tracking method in highway scenes according to claim 1, characterized in that: in step S2, a Faster-RCNN-VDHS model and YOLOV5s- are constructed for vehicle detection in highway scenes. The VDHS model analyzes the input frames of the video, identifies and detects moving targets, creates trajectories corresponding to the detected targets, and initializes the motion variables in the next stage.

4. The DeepSORT-based multi-target vehicle tracking method in the highway scene according to claim 1, characterized in that: in step S3, Kalman filtering is used as the motion model, and the Kalman filtering is based on the motion in the t-1th frame. After the vehicle boundary box is modeled, the prediction frame of the target in the t-th frame can be obtained; then, the detection frame of the t-th frame is used as the observation value through the target detection method, and the prediction frame and the detection frame are combined to obtain all the prediction frames of the t-th frame of the current video. The optimal estimated value of the target detection frame; finally, the optimal estimated value is used to obtain the new predicted value of the target in the t+1th frame. Through continuous iteration, the actual value of the target is finally approximated, and an 8-dimensional space is used to describe it. The state of the vehicle trajectory at a certain moment in the highway scene is represented by the formula:

Among them, (u, v) is the center coordinate of the vehicle detection frame, r is the aspect ratio, h is the height of the frame, is the speed information of (u, v, r, h).

5. The DeepSORT-based multi-target vehicle tracking method in a highway scene according to claim 1, characterized in that: the step S4 includes the acquisition of motion characteristics and apparent characteristics of the target detection frame, based on the vehicle position motion characteristics The Mahalanobis distance can provide more effective information for short-term prediction. The calculation formula is as follows:

Among them, d ⁽¹⁾ (i, j) is the matching degree between the j-th detection frame and the i-th trajectory, d _j is the position coordinate information of the j-th detection frame, and y _j is the relationship between the current frame trajectory and the next frame. Prediction, _Pi is the covariance matrix of the observation space at the current moment predicted by Kalman filter.

6. The DeepSORT-based multi-target vehicle tracking method in a highway scene according to claim 1, characterized in that: the step S4 includes the acquisition of motion characteristics and apparent characteristics of the target detection frame, based on vehicle appearance information. The cosine distance of features plays an important role in recovering the target ID after the target has been occluded for a long time. The calculation formula is as follows:

Among them, the restriction condition of r _j is ||r _j ||=1, R _i is the feature vector library that stores the entire tracking trajectory, is one of the feature vectors in the feature vector library of the tracking trajectory.

7. The multi-target vehicle tracking method in highway scenes based on DeepSORT according to claim 1, characterized in that: the target vehicle association matching in step S5 includes Hungarian algorithm, cascade matching, IOU matching, and the target association problem Convert it to the problem of solving the optimal matching, and optimally match each target between frames.

8. The DeepSORT-based multi-target vehicle tracking method in highway scenes according to claim 1, characterized in that: in step S6, ResNet50-VRHS, DenseNet121-VRHS, ShuffleNetV2-VRHS, and DNFM-RDS are constructed as re-identification The network performs feature extraction and similarity calculation, combined with YOLOV5s-VDHS and Faster-RCNN-VDHS models, including FFRV-RV-DM, YV-RV-DM, FRV-DV-DM, YV-DV-DM, FRV-SV- DM, YV-SV-DM, FRV-DR-DM, and YV-DR-DM eight models are used to build a deep learning-based vehicle tracking model in highway scenes.

9. The DeepSORT-based multi-target vehicle tracking method in a highway scene according to claim 1, characterized in that: in step S6, a verification set is provided through a loss function curve based on the constructed vehicle tracking data set in the highway scene. After data feedback, the number of model iterations, learning rate, and attenuation weight were adjusted in sequence to obtain the MOTA, MOTP, and average detection time of the three network models respectively, and the optimal multi-target vehicle tracking method in the highway scene was selected through comprehensive comparison.

10. The DeepSORT-based multi-target vehicle tracking method in a highway scene according to claim 9, characterized in that: the MOTA and MOTP specifically refer to: MOTA is the multi-target tracking accuracy, which measures the impact of the multi-target tracking method on the vehicle. An important indicator of trajectory continuous tracking ability. The closer the value is to 1, the better the tracking performance. The calculation formula is as follows:

Among them, FN _t is the number of missed target vehicles in the t-th frame, FP _t is the number of falsely-detected target vehicles in the t-th frame, and IDSW _t is the number of target vehicle ID jumps on several vehicle trajectories in the t-th frame. .

MOTP stands for multi-target tracking accuracy. It is an important indicator for measuring the positioning accuracy of the detector in the multi-target vehicle tracking method. It represents the average measurement between the detection frame of the multi-target vehicle tracking algorithm and the manually labeled GT. The calculation formula is as follows:

Among them, c _t is the number of target frames that successfully match the predicted target frame and GT in the t-th frame, is the intersection ratio between the i-th target vehicle and the real annotation box in the t-th frame; the larger the MOTP value, the better the detector positioning performance.