CN118314363B

CN118314363B - Target tracking method, device, storage medium and computer equipment

Info

Publication number: CN118314363B
Application number: CN202410718073.8A
Authority: CN
Inventors: 李中亮; 王军; 杨帆; 谢成兴
Original assignee: Zhuhai Haoze Technology Co ltd
Current assignee: Zhuhai Haoze Technology Co ltd
Priority date: 2024-06-05
Filing date: 2024-06-05
Publication date: 2024-09-10
Anticipated expiration: 2044-06-05
Also published as: CN118314363A

Abstract

The invention discloses a target tracking method, a target tracking device, a storage medium and computer equipment, relates to the technical field of information, and can improve the tracking efficiency and the tracking accuracy of targets. Comprising the following steps: acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked drawn with an external reference detection frame, and a reference area and a reference length-width ratio of the external reference detection frame; taking the next frame image of the reference frame image as a target frame image, drawing an external screening detection frame corresponding to a target to be screened in the target frame image, and determining the screening area and the screening length-width ratio of the external screening detection frame; matching the similarity between the target to be tracked and each target to be screened based on the tracking image characteristics, the reference area, the reference length-width ratio, the screening image characteristics, the screening area and the screening length-width ratio of the images corresponding to the external reference detection frames and the images corresponding to the external screening detection frames; and determining a target to be tracked in the target frame image based on the similarity matching result to track.

Description

Target tracking method, device, storage medium and computer equipment

Technical Field

The present invention relates to the field of information technologies, and in particular, to a target tracking method, a device, a storage medium, and a computer apparatus.

Background

Moving object tracking becomes a current research hotspot, and has wide application in the fields of visual navigation, robots, intelligent transportation, public safety and the like.

Currently, object tracking is typically performed manually. However, in this tracking method, if the background of the image is complex, or if more similar objects appear in the image, the time for finding the target in the image is prolonged, so that the tracking efficiency of the target is low, and meanwhile, if the speed of the target is changed, or if occlusion occurs, the tracking error of the target is caused.

Disclosure of Invention

The invention provides a target tracking method, a target tracking device, a storage medium and computer equipment, which mainly aim at improving the tracking efficiency and the tracking accuracy of a target.

According to a first aspect of the present invention, there is provided a target tracking method comprising:

Acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked drawn with an external reference detection frame, and a reference area and a reference length-width ratio of the external reference detection frame;

Taking the next frame image corresponding to the reference frame image as a target frame image, determining each target to be screened, which is the same as the target to be tracked, in the target frame image, drawing an external screening detection frame corresponding to each target to be screened in the target frame image, and determining the screening area and the screening length-width ratio of the external screening detection frame;

Inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image;

judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not;

If the areas overlap, matching the target to be tracked with each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics;

and determining a target to be tracked in each target to be screened contained in the target frame image based on a similarity matching result to track.

Optionally, determining the target to be tracked in each target to be screened included in the target frame image based on the similarity matching result, and tracking the target to be tracked includes:

determining whether each target to be screened contained in the target frame image contains the target to be tracked or not based on a similarity matching result;

if the target frame image is included, tracking the target to be tracked in the target frame image, taking the target frame image as a new reference frame image, continuously acquiring a next frame image as a new target frame image, and performing target tracking on the new target frame image based on the target to be tracked in the new reference frame image;

And if the target frame image does not contain the target frame image, acquiring a next frame image corresponding to the target frame image as a new target frame image, and continuing to track the new target frame image based on the target to be tracked in the reference frame image.

Optionally, the preset feature extraction model includes a shallow convolution module and a deep convolution module; inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain a tracking image feature corresponding to a target to be tracked in the reference frame image, wherein the method comprises the following steps:

inputting the image corresponding to the external reference detection frame to the shallow convolution module for feature extraction to obtain a shallow reference feature vector;

inputting the image corresponding to the external reference detection frame into the deep convolution module for feature extraction to obtain a deep reference feature vector;

determining tracking image features corresponding to a target to be tracked in the reference frame image based on the shallow reference feature vector and the deep reference feature vector;

Inputting the image corresponding to the external screening detection frame into a preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image, wherein the method comprises the following steps:

inputting the image corresponding to the external screening detection frame to the shallow convolution module for feature extraction to obtain a shallow screening feature vector;

Inputting the image corresponding to the external screening detection frame into the deep convolution module for feature extraction to obtain a deep screening feature vector;

and determining screening image features corresponding to the targets to be screened in the target frame image based on the shallow screening feature vector and the deep screening feature vector.

Optionally, the determining, based on the shallow reference feature vector and the deep reference feature vector, a tracking image feature corresponding to a target to be tracked in the reference frame image includes:

Determining a reference center point coordinate of an external reference detection frame corresponding to the target to be tracked in the reference frame image;

Determining a shallow reference mapping relation between an image corresponding to the external reference detection frame and a corresponding reference feature map in the shallow convolution module, and determining a deep reference mapping relation between the image corresponding to the external reference detection frame and the corresponding reference feature map in the deep convolution module;

Determining shallow reference center point coordinates of an external reference detection frame corresponding to the target to be tracked in the shallow convolution module based on the shallow reference mapping relation and the reference center point coordinates, and determining deep reference center point coordinates of the external reference detection frame corresponding to the target to be tracked in the deep convolution module based on the deep reference mapping relation and the reference center point coordinates;

calculating a shallow reference distance between the reference center point coordinate and the shallow reference center point coordinate, and calculating a deep reference distance between the reference center point coordinate and the deep reference center point coordinate;

Determining a minimum reference distance from the shallow reference distance and the deep reference distance, and determining a reference feature vector under a convolution module corresponding to the minimum reference distance as a tracking image feature corresponding to a target to be tracked in the reference frame image;

The determining, based on the shallow layer screening feature vector and the deep layer screening feature vector, screening image features corresponding to the objects to be screened in the target frame image includes:

determining screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the target frame image;

determining a shallow screening mapping relation between an image corresponding to the external screening detection frame and a corresponding screening feature image in the shallow convolution module, and determining a deep screening mapping relation between the image corresponding to the external screening detection frame and the corresponding screening feature image in the deep convolution module;

Determining shallow screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the shallow convolution module based on the shallow screening mapping relation and the screening center point coordinates, and determining deep screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the deep convolution module based on the deep screening mapping relation and the screening center point coordinates;

Calculating a shallow screening distance between the screening center point coordinates and the shallow screening center point coordinates, and calculating a deep screening distance between the screening center point coordinates and the deep screening center point coordinates;

And determining the minimum screening distance from the shallow screening distance and the deep screening distance, and determining the screening feature vector under the convolution module corresponding to the minimum screening distance as the screening image feature corresponding to each target to be screened in the target frame image.

Optionally, the determining whether the external screening detection frames corresponding to the objects to be screened in the target frame image have region overlapping includes:

Determining whether shallow screening center points of any two external screening detection frames in the external screening detection frames coincide or not based on shallow screening center point coordinates of the external screening detection frames corresponding to the targets to be screened in the shallow convolution module;

if the shallow screening center points coincide, judging that the external screening detection frames corresponding to the targets to be screened have region overlapping;

If the shallow screening center points are not coincident, determining whether the deep screening center points of any two external screening detection frames in the external screening detection frames are coincident or not based on the deep screening center point coordinates of the external screening detection frames corresponding to the targets to be screened in the deep convolution module;

If the deep screening center points coincide, judging that the external screening detection frames corresponding to the targets to be screened have region overlapping;

If the deep screening center points are not coincident, judging that the external screening detection frames corresponding to the targets to be screened are not overlapped in area.

Optionally, the matching the target to be tracked with each target to be screened based on the tracking image feature, the reference area, the reference aspect ratio, the screening image feature, the screening area, and the screening aspect ratio includes:

calculating the feature similarity between the target to be tracked and any target to be screened based on the tracking image features and the screening image features of the target to be screened;

Determining the area square difference between the reference area and the screening area corresponding to any target to be screened;

determining the square difference of the aspect ratio between the reference aspect ratio and the screening aspect ratio corresponding to any target to be screened;

Determining the evaluation weights corresponding to the feature similarity, the area square difference and the aspect ratio square difference;

And adding the feature similarity, the area square difference and the aspect ratio square difference based on the evaluation weight to obtain a similarity matching result between the target to be tracked and any target to be screened.

Optionally, the reference frame image information further includes each non-tracking target drawn with an external non-tracking detection frame, and each non-tracking target is the same as the target to be tracked in category; the matching the similarity between the target to be tracked and each target to be screened based on the tracking image features and the screening image features comprises the following steps:

Inputting the images corresponding to the external non-tracking detection frames into the preset feature extraction model to perform feature extraction to obtain non-tracking image features corresponding to the non-tracking targets in the reference frame images;

Based on the non-tracking image characteristics, clustering each non-tracking target to obtain a clustering center, and determining centroid characteristics corresponding to the clustering center;

calculating the degree of dissimilarity between the clustering center and each target to be screened respectively based on the centroid characteristics and the screening image characteristics corresponding to each target to be screened;

Determining a target dissimilarity degree greater than a preset dissimilarity threshold value in each dissimilarity degree;

Determining target feature similarity greater than a preset similarity threshold value in the feature similarity between the target to be tracked and each target to be screened;

Determining an overlapped screening target in targets to be screened corresponding to the target dissimilarity degree and targets to be screened corresponding to the target feature similarity, and determining targets to be tracked in the target frame image for tracking based on the overlapped screening target;

the clustering processing is performed on each non-tracking target based on the non-tracking image characteristics to obtain a clustering center, and the clustering method comprises the following steps:

Initializing centroids corresponding to different clusters, and determining centroid vectors corresponding to the centroids;

Calculating distances between centroids corresponding to the non-tracking targets and the different clusters based on the non-tracking image features and the centroid vectors, and dividing the non-tracking targets into the different clusters based on the distances;

Determining updated centroids corresponding to the different clusters based on non-tracking image features corresponding to non-tracking targets in the different clusters;

And dividing the non-tracking target into different clusters again based on the non-tracking image characteristics and the centroid vector corresponding to the updated centroid until the updated centroid does not change, and determining the final updated centroid as the clustering center.

According to a second aspect of the present invention, there is provided an object tracking apparatus comprising:

the acquisition unit is used for acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked, a reference area and a reference length-width ratio of an external reference detection frame, and the target to be tracked is drawn with the external reference detection frame;

The determining unit is used for taking the next frame image corresponding to the reference frame image as a target frame image, determining each target to be screened, which is the same as the target to be tracked, in the target frame image, drawing an external screening detection frame corresponding to each target to be screened in the target frame image, and determining the screening area and the screening length-width ratio of the external screening detection frame;

The feature extraction unit is used for inputting the image corresponding to the external reference detection frame into a preset feature extraction model to perform feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model to perform feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image;

The judging unit is used for judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not;

The matching unit is used for matching the similarity between the target to be tracked and each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio if the areas overlap; otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics;

And the target tracking unit is used for determining the target to be tracked in the targets to be screened contained in the target frame image based on the similarity matching result to track the target to be tracked.

According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above object tracking method.

According to a fourth aspect of the present invention there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above target tracking method when executing the program.

According to the target tracking method, the device, the storage medium and the computer equipment, compared with the current mode of manually tracking the target, the method and the device acquire the reference frame image information, wherein the reference frame image information comprises a target to be tracked, which is drawn with an external reference detection frame, and the reference area and the reference length-width ratio of the external reference detection frame; the next frame image corresponding to the reference frame image is used as a target frame image, each target to be screened, which is the same as the target to be tracked, is determined in the target frame image, an external screening detection frame corresponding to each target to be screened is drawn in the target frame image, and the screening area and the screening length-width ratio of the external screening detection frame are determined; meanwhile, inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image; judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not; then if the areas overlap, matching the target to be tracked with each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics; and finally, determining the target to be tracked in the targets to be screened contained in the target frame image based on the similarity matching result to track. According to the image characteristics of the target to be tracked, the reference area and the reference length-width ratio of the external reference detection frame of the target to be tracked, the image characteristics of each target to be screened in the target frame image, the screening area and the screening length-width of the external screening detection frame corresponding to each target to be screened, the target to be tracked and each target to be screened are subjected to similarity matching, and finally the target to be tracked is determined in each target to be screened contained in the target frame image according to the similarity matching result, so that the similarity matching is performed by comprehensively analyzing the image characteristics, the area and the length-width ratio of the detection frame, the accuracy of the similarity matching can be improved, the accuracy of target tracking can be improved, and meanwhile, the time for searching for the tracked target in a plurality of targets can be saved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 shows a flowchart of a target tracking method according to an embodiment of the present invention;

FIG. 2 is a flowchart of another object tracking method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a target tracking apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another object tracking device according to an embodiment of the present invention;

fig. 5 shows a schematic physical structure of a computer device according to an embodiment of the present invention.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

At present, by means of manually tracking the target, if the background of the image is complex or more similar objects appear in the image, the time for finding the target in the image can be prolonged, so that the tracking efficiency of the target is low, and meanwhile, if the speed of the target is changed or shielding occurs, the situation of target tracking errors can be caused.

In order to solve the above problem, an embodiment of the present invention provides a target tracking method, as shown in fig. 1, including:

101. And acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked drawn with an external reference detection frame, and a reference area and a reference length-width ratio of the external reference detection frame.

For the embodiment of the invention, firstly, a reference frame image is acquired, and image preprocessing is performed on the reference frame image, for example, the reference frame image is input into an original RGB (Red, green, blue) image (an image representation mode), the original RGB image is normalized into data of-1 to 1 by using the following method, and a specific normalization formula is as follows for each pixel: rgb=rgb/127.5-1, thus obtaining preprocessed reference frame images, and similarly, each frame image may be preprocessed in the manner described above. The reference frame image comprises various image information, such as a target to be tracked, which is drawn with an external reference detection frame, and a reference area and a reference length-to-width ratio of the external reference detection frame, wherein the target to be tracked in the reference frame image information can be manually specified, then the external reference detection frame of the target to be tracked is drawn, the coordinate position of the external reference detection frame is determined to be box0 x0, box0 y0, box0 x1 and box0 y1, the area and the length-to-width ratio of the external reference detection frame can be calculated through the coordinate positions, namely the reference area and the reference length-to-width ratio, then the similarity matching is carried out on each target to be screened in the next frame image corresponding to the reference frame image and the target to be tracked in the reference frame image, finally the target to be tracked is determined in the next frame image corresponding to the reference frame image according to the similarity matching result, the time wasted by manually determining the tracked target in each frame image can be avoided, and the accuracy is lower.

102. And taking the next frame image corresponding to the reference frame image as a target frame image, determining each target to be screened, which is the same as the target to be tracked, in the target frame image, drawing an external screening detection frame corresponding to each target to be screened in the target frame image, and determining the screening area and the screening aspect ratio of the external screening detection frame.

For the embodiment of the invention, the image preprocessing is performed on the target frame image in advance to obtain a preprocessed target frame image, the target frame image contains multiple targets, for example, if the target to be tracked is a car with a license plate number A, the target frame image is a vehicle driving image shot on a road, then the target frame image also contains objects such as pedestrians and bicycles, at the moment, each object (each target to be screened) with the same category as the target to be tracked needs to be determined in the target frame image, for example, a car with each other license plate number is determined in the target frame image, then an external rectangular frame (external screening detection frame) is drawn for each car, and the area and the length-width ratio of each external screening detection frame, namely the screening area and the screening length-width ratio, are determined based on the position coordinates of the external screening detection frames, and specifically, the external screening detection frames for the targets to be screened can be drawn in the target frame image by utilizing a YOLO model.

103. Inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image.

For the embodiment of the invention, in order to improve the feature extraction precision of the preset feature extraction model, the preset feature extraction model needs to be trained and constructed in advance, and the specific construction method is as follows: constructing at least one preset initial feature extraction model; acquiring a sample image corresponding to the sample external detection frame and corresponding actual image characteristics thereof; a data set is formed by a sample image corresponding to the sample external detection frame and the corresponding actual image characteristics thereof, and the data set is divided into a plurality of groups of test data and a plurality of groups of verification data according to the number of preset initial characteristic extraction models; training the corresponding preset initial feature extraction models by using each group of test data to obtain each trained preset initial feature extraction model, wherein during training, a sample image corresponding to the sample external detection frame is used as input data, and the actual image features are used as output data; verifying the corresponding trained preset initial feature extraction model by using each group of verification data to obtain feature extraction accuracy corresponding to each trained preset initial feature extraction model; and determining the maximum feature extraction accuracy in the feature extraction accuracy, and determining a trained preset initial feature extraction model corresponding to the maximum feature extraction accuracy as a preset feature extraction model. In yet another embodiment of the present invention, the preset feature extraction model may also be a partial structure of YOLO (You Only Look Once, a deep learning model) model. For the embodiment of the invention, the images corresponding to the external reference detection frames in the reference frame images are input into the preset feature extraction model, the tracking image features of the targets to be tracked can be output through the model, the images corresponding to the external screening detection frames in the target frame images are respectively input into the preset feature extraction model, the tracking image features of the targets to be screened can be output through the model, and then the tracking efficiency and the tracking accuracy of the targets can be improved by comprehensively analyzing the image features of the targets to be tracked in the reference frame images, the areas and the length-width ratios of the external reference detection frames, the image features of the targets to be screened in the target frame images, the areas and the length-width ratios of the external screening detection frames, and determining the targets to be tracked in the target frame images according to the comprehensive analysis results.

104. And judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping.

105. If the areas overlap, matching the similarity between the target to be tracked and each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, matching the similarity between the target to be tracked and each target to be screened based on the tracked image features and the screened image features.

For the embodiment of the invention, after the external screening detection frames are drawn for each target to be screened in the target frame image, in order to improve the target tracking accuracy, whether the external screening detection frames corresponding to each target to be screened in the target frame image are overlapped or not needs to be judged, if the external screening detection frames corresponding to each target to be screened are overlapped, the image features of the images corresponding to the two external screening detection frames with overlapped areas are similar, and at the moment, the similarity matching is not accurate enough only according to the image features, so that in order to improve the target tracking accuracy, when the similarity matching is carried out on the target to be tracked in the reference frame image and each target to be screened in the target frame image, the area and the length-width ratio of the external reference detection frame corresponding to each target to be tracked need to be referred to, and the area and the length-width ratio of the external screening detection frame corresponding to each target to be screened need to be referred to, so that the tracking accuracy of the target to be tracked is determined in the target frame image by comprehensively analyzing the image features, the area and the length-width ratio of the detection frames.

In still another embodiment of the present invention, if there is no region overlapping in the external screening detection frame corresponding to the target to be screened, the similarity matching between the target to be tracked and each target to be screened is performed only according to the tracking image feature corresponding to the target to be tracked in the reference frame image and the screening image feature corresponding to each target to be screened in the target frame image, and finally, the target to be tracked is determined in the target frame image according to the similarity matching result, so that in the case that there is no region overlapping in the external screening detection frame corresponding to the target to be screened, the target to be tracked is determined only according to the image features in the two frame images, thereby saving computing resources and improving the target tracking efficiency.

106. And determining the target to be tracked in each target to be screened contained in the target frame image based on the similarity matching result, and tracking.

For the embodiment of the invention, after similarity matching is carried out on the targets to be tracked in the reference frame image and each target to be screened in the target frame image respectively to obtain a similarity matching result, finding the target with the similarity meeting the preset requirement with the target to be tracked in the target frame image as the target to be tracked in the target frame image according to the matching result, and continuing target tracking on the target to be tracked in the target frame image.

According to the target tracking method provided by the invention, compared with the current mode of manually tracking the target, the method comprises the steps of obtaining the image information of the reference frame, wherein the image information of the reference frame comprises a target to be tracked, which is drawn with an external reference detection frame, and the reference area and the reference length-width ratio of the external reference detection frame; the next frame image corresponding to the reference frame image is used as a target frame image, each target to be screened, which is the same as the target to be tracked, is determined in the target frame image, an external screening detection frame corresponding to each target to be screened is drawn in the target frame image, and the screening area and the screening length-width ratio of the external screening detection frame are determined; meanwhile, inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image; judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not; then if the areas overlap, matching the target to be tracked with each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics; and finally, determining the target to be tracked in the targets to be screened contained in the target frame image based on the similarity matching result to track. According to the image characteristics of the target to be tracked, the reference area and the reference length-width ratio of the external reference detection frame of the target to be tracked, the image characteristics of each target to be screened in the target frame image, the screening area and the screening length-width of the external screening detection frame corresponding to each target to be screened, the target to be tracked and each target to be screened are subjected to similarity matching, and finally the target to be tracked is determined in each target to be screened contained in the target frame image according to the similarity matching result, so that the similarity matching is performed by comprehensively analyzing the image characteristics, the area and the length-width ratio of the detection frame, the accuracy of the similarity matching can be improved, the accuracy of target tracking can be improved, and meanwhile, the time for searching for the tracked target in a plurality of targets can be saved.

Further, in order to better illustrate the above process of tracking the target, as a refinement and extension of the above embodiment, an embodiment of the present invention provides another target tracking method, as shown in fig. 2, where the method includes:

201. And acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked drawn with an external reference detection frame, and a reference area and a reference length-width ratio of the external reference detection frame.

Specifically, a target to be tracked is designated in a reference frame image, an external reference detection frame corresponding to the target to be tracked is drawn, the reference area and the reference length-width ratio of the external reference detection frame are determined, and finally the information is determined as reference frame image information.

202. And taking the next frame image corresponding to the reference frame image as a target frame image, determining each target to be screened, which is the same as the target to be tracked, in the target frame image, drawing an external screening detection frame corresponding to each target to be screened in the target frame image, and determining the screening area and the screening aspect ratio of the external screening detection frame.

203. Inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image.

Specifically, since the target to be tracked is specified in the reference frame image, for the next frame image (target frame image) corresponding to the reference frame image, the target most similar to the target to be tracked needs to be found in the target frame image as the target to be tracked in the frame image for tracking, therefore, firstly, each target to be screened of the same class as the target to be tracked needs to be found in the target frame image, the external screening detection frame corresponding to each target to be screened is drawn, the screening area and the screening aspect ratio corresponding to each external screening detection frame are determined, meanwhile, the tracking image characteristics of the target to be tracked and the screening image characteristics of each target to be screened need to be extracted, Based on this, step 203 specifically includes: inputting the image corresponding to the external reference detection frame to the shallow convolution module for feature extraction to obtain a shallow reference feature vector; Inputting the image corresponding to the external reference detection frame into the deep convolution module for feature extraction to obtain a deep reference feature vector; determining tracking image features corresponding to a target to be tracked in the reference frame image based on the shallow reference feature vector and the deep reference feature vector; the method for determining the tracking image features corresponding to the target to be tracked in the reference frame image based on the shallow reference feature vector and the deep reference feature vector comprises the following steps: determining a reference center point coordinate of an external reference detection frame corresponding to the target to be tracked in the reference frame image; determining a shallow reference mapping relation between an image corresponding to the external reference detection frame and a corresponding reference feature map in the shallow convolution module, and determining a deep reference mapping relation between the image corresponding to the external reference detection frame and the corresponding reference feature map in the deep convolution module; Determining shallow reference center point coordinates of an external reference detection frame corresponding to the target to be tracked in the shallow convolution module based on the shallow reference mapping relation and the reference center point coordinates, and determining deep reference center point coordinates of the external reference detection frame corresponding to the target to be tracked in the deep convolution module based on the deep reference mapping relation and the reference center point coordinates; calculating a shallow reference distance between the reference center point coordinate and the shallow reference center point coordinate, and calculating a deep reference distance between the reference center point coordinate and the deep reference center point coordinate; and determining a minimum reference distance from the shallow reference distance and the deep reference distance, and determining a reference feature vector under a convolution module corresponding to the minimum reference distance as a tracking image feature corresponding to a target to be tracked in the reference frame image. Further, the method for extracting the screening image features corresponding to each target to be screened in the target frame image comprises the following steps: inputting the image corresponding to the external screening detection frame to the shallow convolution module for feature extraction to obtain a shallow screening feature vector; inputting the image corresponding to the external screening detection frame into the deep convolution module for feature extraction to obtain a deep screening feature vector; and determining screening image features corresponding to the targets to be screened in the target frame image based on the shallow screening feature vector and the deep screening feature vector. The method for determining the screening image features corresponding to the targets to be screened in the target frame image based on the shallow screening feature vector and the deep screening feature vector comprises the following steps: determining screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the target frame image; Determining a shallow screening mapping relation between an image corresponding to the external screening detection frame and a corresponding screening feature image in the shallow convolution module, and determining a deep screening mapping relation between the image corresponding to the external screening detection frame and the corresponding screening feature image in the deep convolution module; determining shallow screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the shallow convolution module based on the shallow screening mapping relation and the screening center point coordinates, and determining deep screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the deep convolution module based on the deep screening mapping relation and the screening center point coordinates; Calculating a shallow screening distance between the screening center point coordinates and the shallow screening center point coordinates, and calculating a deep screening distance between the screening center point coordinates and the deep screening center point coordinates; and determining the minimum screening distance from the shallow screening distance and the deep screening distance, and determining the screening feature vector under the convolution module corresponding to the minimum screening distance as the screening image feature corresponding to each target to be screened in the target frame image.

Specifically, the preset feature extraction model includes a shallow convolution module (F0 module) and a deep convolution module (F1 module), where the resolution of F1 may be set to be 1/2 of F0, the images corresponding to the external reference detection frame are respectively input to the F0 module and the F1 module, the shallow reference feature vector is output through the F0 module, the deep reference feature vector is output through the F1 module, and then one feature vector needs to be selected from the shallow reference feature vector and the deep reference feature vector to serve as a tracked image feature corresponding to the target to be tracked, and the specific selection method is as follows: first determining the position coordinates of the object to be tracked in the reference frame image, For example, the position coordinate is (x ₀,y₀;x₁,y₁), then the center point coordinate of the external reference detection frame is determined to be (o _x=(x₁+x₀)/2,o_y=(y₁+y₀)/2 according to the position coordinate, where o _x is the abscissa of the center point, o _y is the ordinate of the center point, If the input image (the image corresponding to the external reference detection frame) has a size of h×w×c, the F0 has a size of H ₀×w₀×c₀, and the F1 has a size of H ₁×w₁×c₁, where C ₀、c₁ is the length of the feature vector, the mapping relationship between the image corresponding to the external reference detection frame and the corresponding reference feature map in the F0 module can be obtained, Obtaining the position (shallow reference center point coordinate) of o _x,o_y in the F0 module as (F0 _x,F0_y), obtaining the position (deep reference center point coordinate) of o _x,o_y in the F1 module as (F1 _x,F1_y) through the mapping relation between the image corresponding to the external reference detection frame and the corresponding reference feature map in the F1 module, At this point (F0 _x,F0_y) and (F1 _x,F1_y) are strongly spatially correlated with the circumscribing reference detection frame (x ₀,y₀;x₁,y₁), after which the distance between (F0 _x,F0_y) and (o _x,o_y) (shallow reference distance) is calculated, And calculating the distance (deep reference distance) between the (F1 _x,F1_y) and the (o _x,o_y), determining a convolution module corresponding to the minimum distance between the shallow reference distance and the deep reference distance, and determining the image feature output by the convolution module as the tracking image feature corresponding to the target to be tracked, for example, if the shallow reference distance is the minimum, determining the shallow reference feature vector output by the F0 module as the tracking image feature corresponding to the target to be tracked in the reference frame image.

Further, the images corresponding to the external screening detection frame are input to the F0 module and the F1 module respectively, the shallow screening feature vector is output through the F0 module, the deep screening feature vector is output through the F1 module, and then one feature vector is required to be selected from the shallow screening feature vector and the deep screening feature vector to serve as the screening image feature corresponding to the target to be screened, and the specific selection method is as follows: first, the position coordinates of the object to be screened in the object frame image are determined, for example, the position coordinates are (z ₀,q₀;z₁,q₁), Then, according to the position coordinates, determining the central point coordinates of the external screening detection frame as (o _z=(z₁+z₀)/2,o_q=(q₁+q₀)/2), wherein o _z is the abscissa of the central point, o _q is the ordinate of the central point, then, through the mapping relation between the image corresponding to the external screening detection frame and the corresponding screening feature map in the F0 module, Obtaining the position (shallow screening center point coordinate) of o _z,o_q in the F0 module as (F0 _z,F0_q), obtaining the position (deep screening center point coordinate) of o _z,o_q in the F1 module as (F1 _z,F1_q) through the mapping relation between the image corresponding to the external screening detection frame and the corresponding screening feature map in the F1 module, At this point (F0 _z,F0_q) and (F1 _z,F1_q) are strongly spatially correlated with the circumscribing screening test box (z ₀,q₀;z₁,q₁), after which the distance between (F0 _z,F0_q) and (o _z,o_q) (shallow screening distance) is calculated, And calculating the distance (deep screening distance) between the (F1 _z,F1_q) and the (o _z,o_q), determining a convolution module corresponding to the minimum distance between the shallow screening distance and the deep screening distance, and determining the image feature output by the convolution module as the screening image feature corresponding to the target to be screened, so that the screening image feature corresponding to each target to be screened in the target frame image can be determined.

204. And judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping.

For the embodiment of the present invention, in order to improve the target tracking efficiency and tracking accuracy, before performing similarity matching, it is first required to determine whether there is a region overlapping in the external screening detection frames corresponding to the targets to be screened in the target frame image, based on this, step 204 specifically includes: determining whether shallow screening center points of any two external screening detection frames in the external screening detection frames coincide or not based on shallow screening center point coordinates of the external screening detection frames corresponding to the targets to be screened in the shallow convolution module; if the shallow screening center points coincide, judging that the external screening detection frames corresponding to the targets to be screened have region overlapping; if the shallow screening center points are not coincident, determining whether the deep screening center points of any two external screening detection frames in the external screening detection frames are coincident or not based on the deep screening center point coordinates of the external screening detection frames corresponding to the targets to be screened in the deep convolution module; if the deep screening center points coincide, judging that the external screening detection frames corresponding to the targets to be screened have region overlapping; if the deep screening center points are not coincident, judging that the external screening detection frames corresponding to the targets to be screened are not overlapped in area.

Specifically, in step 203, it is determined whether the shallow screening center coordinates of the external screening detection frames corresponding to the objects to be screened in the target frame image in the F0 module and the deep screening center coordinates of the F1 module overlap, if the overlapping condition exists, it is determined that the external screening detection frames corresponding to the objects to be screened overlap, if the overlapping condition does not exist, it is also required to determine whether the deep screening center coordinates of the two external screening detection frames corresponding to the external screening detection frames overlap, if the overlapping condition exists, it is determined that the external screening detection frames corresponding to the objects to be screened overlap, and if the deep screening center coordinates of all the detection frames do not exist, it is determined that the external screening detection frames do not overlap.

205. If the areas overlap, matching the similarity between the target to be tracked and each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, matching the similarity between the target to be tracked and each target to be screened based on the tracked image features and the screened image features.

For the embodiment of the present invention, if there is a region overlapping between the external screening detection frames corresponding to each target to be screened, it is indicated that the image features of the images in the two detection frames with the region overlapping are similar, so that the similarity matching is not accurate enough only according to the image features, and at this time, comprehensive consideration needs to be performed by combining the area and the aspect ratio of the detection frames, based on this, step 205 specifically includes: calculating the feature similarity between the target to be tracked and any target to be screened based on the tracking image features and the screening image features of the target to be screened; determining the area square difference between the reference area and the screening area corresponding to any target to be screened; determining the square difference of the aspect ratio between the reference aspect ratio and the screening aspect ratio corresponding to any target to be screened; determining the evaluation weights corresponding to the feature similarity, the area square difference and the aspect ratio square difference; and adding the feature similarity, the area square difference and the aspect ratio square difference based on the evaluation weight to obtain a similarity matching result between the target to be tracked and any target to be screened.

Specifically, firstly, according to the tracking image characteristics and the screening image characteristics of each target to be screened, calculating the feature similarity between the target to be tracked and each target to be screened, for example, the feature similarity can be cosine similarity or Euclidean distance calculation. And then, calculating a similarity matching result between the target to be tracked and each target to be screened according to the following formula:

Wherein, w represents a similarity matching result, i represents an i-th target to be screened, t _i represents feature similarity between the i-th target to be screened and the target to be tracked, so represents a reference area of an external reference detection frame corresponding to the target to be tracked, lo represents a reference length-width ratio of the external reference detection frame corresponding to the target to be tracked, S _ni represents a screening area of the external screening detection frame corresponding to the i-th target to be screened, L _ni represents a screening length-width ratio of the external screening detection frame corresponding to the i-th target to be screened, and the similarity matching result between the target to be tracked and each target to be screened can be calculated through the formula. Therefore, under the condition that the areas of the external screening detection frames overlap, the similarity matching is carried out on the targets to be tracked and each target to be screened according to the area and the length-width ratio of the external reference detection frame, the tracking image characteristics of the targets to be tracked, the area and the length-width ratio of the external screening detection frame and the screening image characteristics of the targets to be screened, so that the matching precision can be improved, and the target tracking accuracy can be improved.

Further, if no region overlapping exists in the external screening detection frames corresponding to the objects to be screened, similarity matching is only performed according to image features, and based on the method, the method comprises the following steps: inputting the images corresponding to the external non-tracking detection frames into the preset feature extraction model to perform feature extraction to obtain non-tracking image features corresponding to the non-tracking targets in the reference frame images; based on the non-tracking image characteristics, clustering each non-tracking target to obtain a clustering center, and determining centroid characteristics corresponding to the clustering center; calculating the degree of dissimilarity between the clustering center and each target to be screened respectively based on the centroid characteristics and the screening image characteristics corresponding to each target to be screened; determining a target dissimilarity degree greater than a preset dissimilarity threshold value in each dissimilarity degree; determining target feature similarity greater than a preset similarity threshold value in the feature similarity between the target to be tracked and each target to be screened; determining an overlapped screening target in targets to be screened corresponding to the target dissimilarity degree and targets to be screened corresponding to the target feature similarity, and determining targets to be tracked in the target frame image for tracking based on the overlapped screening target; the method for clustering the non-tracking targets to obtain the clustering center specifically comprises the following steps: initializing centroids corresponding to different clusters, and determining centroid vectors corresponding to the centroids; calculating distances between centroids corresponding to the non-tracking targets and the different clusters based on the non-tracking image features and the centroid vectors, and dividing the non-tracking targets into the different clusters based on the distances; determining updated centroids corresponding to the different clusters based on non-tracking image features corresponding to non-tracking targets in the different clusters; and dividing the non-tracking target into different clusters again based on the non-tracking image characteristics and the centroid vector corresponding to the updated centroid until the updated centroid does not change, and determining the final updated centroid as the clustering center.

The preset dissimilar threshold and the preset similar threshold are set according to actual requirements. Specifically, the reference frame image information further comprises each non-tracking target drawn with an external non-tracking detection frame, each non-tracking target is the same as the category of the target to be tracked, for example, if the target to be tracked is a car with a license plate number A, the non-tracking target is a car with other license plates in the reference frame image, then the images corresponding to the external non-tracking detection frame are input into a preset feature extraction model for feature extraction, the non-tracking image features corresponding to each non-tracking target in the reference frame image are obtained, the extraction mode of the specific non-tracking image features is the same as the extraction mode of the tracking image features, then clustering is carried out on each non-tracking target according to the non-tracking image features, so as to determine a clustering center, specifically, the non-tracking image features corresponding to K clusters are selected, the distances from each non-tracking image feature to the K clusters of the non-tracking image features, the non-tracking targets are distributed into clusters corresponding to the nearest centroid vectors, then the non-tracking image features are calculated for each cluster, the non-tracking image features are calculated again, the positions of each cluster correspond to the non-tracking image feature clusters are clustered again, and the centroid vectors are not changed, and then the final centroid vector is clustered is not changed until the final centroid vector is changed, and the final centroid vector is not changed, and clustering the non-tracking targets by using the Euclidean distance to obtain a clustering center and a centroid characteristic corresponding to the clustering center, calculating the dissimilarity degree between the clustering center and each target to be screened according to the centroid characteristic and the screening image characteristic corresponding to each target to be screened, calculating the similarity degree between each target to be screened and the target to be tracked, and determining the target to be tracked in the target frame image according to each dissimilarity degree and each similarity degree. For example, the degree of dissimilarity between the cluster center and the target to be screened a is 0.9, the degree of dissimilarity between the cluster center and the target to be screened B is 0.4, the degree of dissimilarity between the cluster center and the target to be screened C is 0.3, the degree of dissimilarity between the cluster center and the target to be screened D is 0.7, the degree of similarity between the target to be tracked and the target to be screened a is 0.98, the degree of similarity between the target to be tracked and the target to be screened B is 0.7, the degree of similarity between the target to be tracked and the target to be screened C is 0.55, the degree of similarity between the target to be tracked and the target to be screened D is 0.65, and if the preset dissimilarity threshold is 0.85, the target to be screened a is finally determined to be the target to be tracked in the target frame image.

206. And determining whether each target to be screened contained in the target frame image contains the target to be tracked or not based on the similarity matching result.

207. If the target frame image is included, tracking the target to be tracked in the target frame image, taking the target frame image as a new reference frame image, continuously acquiring the next frame image as a new target frame image, and tracking the target of the new target frame image based on the target to be tracked in the new reference frame image.

208. If the target frame image does not contain the target frame image, acquiring a next frame image corresponding to the target frame image as a new target frame image, and continuing to track the new target frame image based on the target to be tracked in the reference frame image.

Specifically, if the target to be tracked is found in the target frame image according to the similarity matching result, the target to be tracked in the target frame image is tracked, meanwhile, the next frame image corresponding to the target frame image is acquired, the target frame image is taken as a new reference frame image, the next frame image corresponding to the target frame image is taken as a new target frame image, the target to be tracked in the new reference frame image is subjected to similarity matching with each target to be screened in the new target frame image according to the mode described in the steps, and finally, the target to be tracked is determined in the new target frame image according to the similarity matching result to track, and so on, so that the target to be tracked in each frame image can be tracked, and the tracking efficiency and the tracking accuracy of the target can be improved. Further, if the target to be tracked is not found in the target frame image according to the similarity matching result, acquiring a next frame image corresponding to the target frame image, determining the next frame image corresponding to the target frame image as a new target frame image, continuing to perform similarity matching according to a previous frame image corresponding to the target frame image, namely, the target to be tracked in the reference frame image and each target to be screened in the new target frame image, and finally determining the target to be tracked in the new target frame image according to the similarity matching result. For example, if the reference frame image is denoted as image a, the next frame image corresponding to image a is denoted as image B, the next frame image corresponding to image B is denoted as image C, if the target to be tracked is detected in image B according to the target to be tracked in image a, tracking is performed by continuously detecting the target to be tracked in image C according to the target to be tracked in image B, and if the target to be tracked is not detected in image B according to the target to be tracked in image a, then the target to be tracked needs to be detected and tracked in image C according to the target to be tracked in image a.

According to the other target tracking method provided by the invention, compared with the current mode of manually tracking the target, the method comprises the steps of obtaining the image information of the reference frame, wherein the image information of the reference frame comprises a target to be tracked, which is drawn with an external reference detection frame, and the reference area and the reference length-width ratio of the external reference detection frame; the next frame image corresponding to the reference frame image is used as a target frame image, each target to be screened, which is the same as the target to be tracked, is determined in the target frame image, an external screening detection frame corresponding to each target to be screened is drawn in the target frame image, and the screening area and the screening length-width ratio of the external screening detection frame are determined; Meanwhile, inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image; judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not; then if the areas overlap, matching the target to be tracked with each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; Otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics; finally, based on a similarity matching result, determining whether each target to be screened contained in the target frame image contains the target to be tracked or not; if the target frame image is included, tracking the target to be tracked in the target frame image, taking the target frame image as a new reference frame image, continuously acquiring a next frame image as a new target frame image, and performing target tracking on the new target frame image based on the target to be tracked in the new reference frame image; and if the target frame image does not contain the target frame image, acquiring a next frame image corresponding to the target frame image as a new target frame image, and continuing to track the new target frame image based on the target to be tracked in the reference frame image. The method comprises the steps of performing similarity matching between the target to be tracked and each target to be screened according to the image characteristics of the target to be tracked specified in the reference frame image, the reference area and the reference length-width ratio of the circumscribed reference detection frame of the target to be tracked, the image characteristics of each target to be screened in the target frame image, the screening area and the screening length-width of the circumscribed screening detection frame corresponding to each target to be screened, determining the target to be tracked in each target to be screened contained in the target frame image according to the similarity matching result, performing similarity matching by comprehensively analyzing the image characteristics, the area and the length-width ratio of the detection frame, improving the accuracy of the similarity matching, furthermore, the accuracy of target tracking can be improved, and meanwhile, the time for searching for a tracked target in a plurality of targets can be saved, so that the target tracking efficiency can be improved.

Further, as a specific implementation of fig. 1, an embodiment of the present invention provides a target tracking apparatus, as shown in fig. 3, where the apparatus includes: an acquisition unit 31, a determination unit 32, a feature extraction unit 33, a judgment unit 34, a matching unit 35, and a target tracking unit 36.

The obtaining unit 31 may be configured to obtain reference frame image information, where the reference frame image information includes a target to be tracked on which an external reference detection frame is drawn, and a reference area and a reference aspect ratio of the external reference detection frame.

The determining unit 32 may be configured to determine, in the target frame image, each target to be screened having the same category as the target to be tracked, and draw, in the target frame image, an external screening detection frame corresponding to each target to be screened, and determine a screening area and a screening aspect ratio of the external screening detection frame, with the next frame image corresponding to the reference frame image being a target frame image.

The feature extraction unit 33 may be configured to input an image corresponding to the external reference detection frame into a preset feature extraction model to perform feature extraction, obtain a tracked image feature corresponding to a target to be tracked in the reference frame image, and input an image corresponding to the external screening detection frame into the preset feature extraction model to perform feature extraction, so as to obtain a screened image feature corresponding to each target to be screened in the target frame image.

The judging unit 34 may be configured to judge whether there is a region overlapping between the external screening detection frames corresponding to the objects to be screened in the target frame image.

The matching unit 35 may be configured to, if there is a region overlapping, perform similarity matching on the target to be tracked and each target to be screened based on the tracking image feature, the reference area, the reference aspect ratio, the screening image feature, the screening area, and the screening aspect ratio; otherwise, matching the target to be tracked with each target to be screened in similarity based on the tracking image features and the screening image features.

The target tracking unit 36 may be configured to determine, based on a similarity matching result, a target to be tracked from the targets to be screened included in the target frame image, for tracking.

In a specific application scenario, in order to determine a target to be tracked in a target frame image for tracking, as shown in fig. 4, the target tracking unit 36 includes a first determining module 361 and a target tracking module 362.

The first determining module 361 may be configured to determine, based on a similarity matching result, whether each target to be screened included in the target frame image includes the target to be tracked.

The target tracking module 362 may be configured to track the target to be tracked in the target frame image if the target frame image is included, and use the target frame image as a new reference frame image, and continue to acquire a next frame image as a new target frame image, and track the target of the new target frame image based on the target to be tracked in the new reference frame image.

The target tracking module 362 may be further configured to acquire a next frame image corresponding to the target frame image as a new target frame image if the target frame image does not include the target frame image, and continue to track the new target frame image based on the target to be tracked in the reference frame image.

In a specific application scenario, in order to determine the image features, the feature extraction unit 33 includes a first extraction module 331 and a second determination module 332.

The first extraction module 331 may be configured to input an image corresponding to the external reference detection frame to the shallow convolution module for feature extraction, so as to obtain a shallow reference feature vector.

The first extraction module 331 may be further configured to input an image corresponding to the external reference detection frame to the deep convolution module for feature extraction, so as to obtain a deep reference feature vector.

The second determining module 332 may be configured to determine a tracking image feature corresponding to the target to be tracked in the reference frame image based on the shallow reference feature vector and the deep reference feature vector.

The first extraction module 331 may be further configured to input an image corresponding to the external screening detection frame to the shallow convolution module for feature extraction, so as to obtain a shallow screening feature vector.

The first extraction module 331 may be further configured to input an image corresponding to the external screening detection frame to the deep convolution module for feature extraction, so as to obtain a deep screening feature vector.

The second determining module 332 may be further configured to determine a screening image feature corresponding to each target to be screened in the target frame image based on the shallow screening feature vector and the deep screening feature vector.

In a specific application scenario, in order to determine the image feature, the second determining module 332 may be specifically configured to determine a reference center point coordinate of an external reference detection frame corresponding to the target to be tracked in the reference frame image; determining a shallow reference mapping relation between an image corresponding to the external reference detection frame and a corresponding reference feature map in the shallow convolution module, and determining a deep reference mapping relation between the image corresponding to the external reference detection frame and the corresponding reference feature map in the deep convolution module; determining shallow reference center point coordinates of an external reference detection frame corresponding to the target to be tracked in the shallow convolution module based on the shallow reference mapping relation and the reference center point coordinates, and determining deep reference center point coordinates of the external reference detection frame corresponding to the target to be tracked in the deep convolution module based on the deep reference mapping relation and the reference center point coordinates; calculating a shallow reference distance between the reference center point coordinate and the shallow reference center point coordinate, and calculating a deep reference distance between the reference center point coordinate and the deep reference center point coordinate; and determining a minimum reference distance from the shallow reference distance and the deep reference distance, and determining a reference feature vector under a convolution module corresponding to the minimum reference distance as a tracking image feature corresponding to a target to be tracked in the reference frame image.

The second determining module 332 may be specifically configured to determine a screening center point coordinate of an external screening detection frame corresponding to each target to be screened in the target frame image; determining a shallow screening mapping relation between an image corresponding to the external screening detection frame and a corresponding screening feature image in the shallow convolution module, and determining a deep screening mapping relation between the image corresponding to the external screening detection frame and the corresponding screening feature image in the deep convolution module; determining shallow screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the shallow convolution module based on the shallow screening mapping relation and the screening center point coordinates, and determining deep screening center point coordinates of an external screening detection frame corresponding to each target to be screened in the deep convolution module based on the deep screening mapping relation and the screening center point coordinates; calculating a shallow screening distance between the screening center point coordinates and the shallow screening center point coordinates, and calculating a deep screening distance between the screening center point coordinates and the deep screening center point coordinates; and determining the minimum screening distance from the shallow screening distance and the deep screening distance, and determining the screening feature vector under the convolution module corresponding to the minimum screening distance as the screening image feature corresponding to each target to be screened in the target frame image.

In a specific application scenario, in order to determine whether there is a region overlapping between the external screening detection frames corresponding to the objects to be screened in the target frame image, the determining unit 34 may specifically be configured to determine whether the shallow screening center points of any two external screening detection frames in each external screening detection frame overlap based on the shallow screening center point coordinates of the external screening detection frame corresponding to the objects to be screened in the shallow convolution module; if the shallow screening center points coincide, judging that the external screening detection frames corresponding to the targets to be screened have region overlapping; if the shallow screening center points are not coincident, determining whether the deep screening center points of any two external screening detection frames in the external screening detection frames are coincident or not based on the deep screening center point coordinates of the external screening detection frames corresponding to the targets to be screened in the deep convolution module; if the deep screening center points coincide, judging that the external screening detection frames corresponding to the targets to be screened have region overlapping; if the deep screening center points are not coincident, judging that the external screening detection frames corresponding to the targets to be screened are not overlapped in area.

In a specific application scenario, in order to perform similarity matching between the target to be tracked and each target to be screened, the matching unit 35 includes a calculating module 351, a third determining module 352, and an adding module 353.

The calculating module 351 may be configured to calculate, based on the tracking image feature and the screening image feature of any one of the targets to be screened, feature similarity between the target to be tracked and the any one of the targets to be screened.

The third determining module 352 may be configured to determine an area squared difference between the reference area and a screening area corresponding to the any target to be screened.

The third determining module 352 may be further configured to determine an aspect ratio square difference between the reference aspect ratio and a screening aspect ratio corresponding to the any target to be screened.

The third determining module 352 may be further configured to determine an evaluation weight corresponding to the feature similarity, the area square difference, and the aspect ratio square difference.

The adding module 353 may be configured to add the feature similarity, the area square difference, and the aspect ratio square difference based on the evaluation weight, to obtain a similarity matching result between the target to be tracked and the any target to be screened.

In a specific application scenario, in order to perform similarity matching between the target to be tracked and each target to be screened, the matching unit 35 further includes a second extraction module 354 and a clustering module 355.

The second extraction module 354 may be configured to input an image corresponding to each external non-tracking detection frame into the preset feature extraction model to perform feature extraction, so as to obtain a non-tracking image feature corresponding to each non-tracking target in the reference frame image.

The clustering module 355 may be configured to perform clustering processing on each of the non-tracking targets based on the non-tracking image features, to obtain a cluster center, and determine a centroid feature corresponding to the cluster center.

The calculating module 351 may be further configured to calculate a degree of dissimilarity between the clustering center and each of the objects to be screened, based on the centroid feature and the screening image feature corresponding to each of the objects to be screened.

The third determining module 352 may be further configured to determine, among the degrees of dissimilarity, a target degree of dissimilarity that is greater than a preset dissimilarity threshold.

The third determining module 352 may be further configured to determine, among feature similarities between the target to be tracked and the targets to be screened, a feature similarity of the target that is greater than a preset similarity threshold.

The third determining module 352 may be further configured to determine an overlapping filtering target from the targets to be filtered corresponding to the target dissimilarity degree and the targets to be filtered corresponding to the target feature similarity, and determine the targets to be tracked in the target frame image based on the overlapping filtering target for tracking.

In a specific application scenario, in order to perform clustering processing on each non-tracking target to obtain a clustering center, the clustering module 355 may be specifically configured to initialize centroids corresponding to different clusters, and determine centroid vectors corresponding to the centroids; calculating distances between centroids corresponding to the non-tracking targets and the different clusters based on the non-tracking image features and the centroid vectors, and dividing the non-tracking targets into the different clusters based on the distances; determining updated centroids corresponding to the different clusters based on non-tracking image features corresponding to non-tracking targets in the different clusters; and dividing the non-tracking target into different clusters again based on the non-tracking image characteristics and the centroid vector corresponding to the updated centroid until the updated centroid does not change, and determining the final updated centroid as the clustering center.

It should be noted that, for other corresponding descriptions of each functional module related to the object tracking device provided by the embodiment of the present invention, reference may be made to corresponding descriptions of the method shown in fig. 1, which are not repeated herein.

Based on the above method as shown in fig. 1, correspondingly, the embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the following steps: acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked drawn with an external reference detection frame, and a reference area and a reference length-width ratio of the external reference detection frame; taking the next frame image corresponding to the reference frame image as a target frame image, determining each target to be screened, which is the same as the target to be tracked, in the target frame image, drawing an external screening detection frame corresponding to each target to be screened in the target frame image, and determining the screening area and the screening length-width ratio of the external screening detection frame; inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image; judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not; if the areas overlap, matching the target to be tracked with each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics; and determining a target to be tracked in each target to be screened contained in the target frame image based on a similarity matching result to track.

Based on the embodiment of the method shown in fig. 1 and the device shown in fig. 3, the embodiment of the invention further provides a physical structure diagram of a computer device, as shown in fig. 5, where the computer device includes: a processor 41, a memory 42, and a computer program stored on the memory 42 and executable on the processor, wherein the memory 42 and the processor 41 are both arranged on a bus 43, the processor 41 performing the following steps when said program is executed: acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked drawn with an external reference detection frame, and a reference area and a reference length-width ratio of the external reference detection frame; taking the next frame image corresponding to the reference frame image as a target frame image, determining each target to be screened, which is the same as the target to be tracked, in the target frame image, drawing an external screening detection frame corresponding to each target to be screened in the target frame image, and determining the screening area and the screening length-width ratio of the external screening detection frame; inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image; judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not; if the areas overlap, matching the target to be tracked with each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics; and determining a target to be tracked in each target to be screened contained in the target frame image based on a similarity matching result to track.

According to the technical scheme, the method comprises the steps of obtaining reference frame image information, wherein the reference frame image information comprises a target to be tracked, which is drawn with an external reference detection frame, and a reference area and a reference length-width ratio of the external reference detection frame; the next frame image corresponding to the reference frame image is used as a target frame image, each target to be screened, which is the same as the target to be tracked, is determined in the target frame image, an external screening detection frame corresponding to each target to be screened is drawn in the target frame image, and the screening area and the screening length-width ratio of the external screening detection frame are determined; meanwhile, inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain tracking image features corresponding to the targets to be tracked in the reference frame image, and inputting the image corresponding to the external screening detection frame into the preset feature extraction model for feature extraction to obtain screening image features corresponding to the targets to be screened in the target frame image; judging whether the external screening detection frames corresponding to the targets to be screened in the target frame image have region overlapping or not; then if the areas overlap, matching the target to be tracked with each target to be screened based on the tracking image features, the reference area, the reference length-width ratio, the screening image features, the screening area and the screening length-width ratio; otherwise, performing similarity matching on the target to be tracked and each target to be screened based on the tracking image characteristics and the screening image characteristics; and finally, determining the target to be tracked in the targets to be screened contained in the target frame image based on the similarity matching result to track. According to the image characteristics of the target to be tracked, the reference area and the reference length-width ratio of the external reference detection frame of the target to be tracked, the image characteristics of each target to be screened in the target frame image, the screening area and the screening length-width of the external screening detection frame corresponding to each target to be screened, the target to be tracked and each target to be screened are subjected to similarity matching, and finally the target to be tracked is determined in each target to be screened contained in the target frame image according to the similarity matching result, so that the similarity matching is performed by comprehensively analyzing the image characteristics, the area and the length-width ratio of the detection frame, the accuracy of the similarity matching can be improved, the accuracy of target tracking can be improved, and meanwhile, the time for searching for the tracked target in a plurality of targets can be saved.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A target tracking method, comprising:

Acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked, a reference area and a reference length-width ratio of an external reference detection frame, and each non-tracking target, wherein the target to be tracked is drawn with the external reference detection frame, the reference area and the reference length-width ratio of the external reference detection frame, and each non-tracking target is drawn with an external non-tracking detection frame, and the categories of the non-tracking targets and the target to be tracked are the same;

determining a target to be tracked in each target to be screened contained in the target frame image based on a similarity matching result to track;

The determining the target to be tracked in each target to be screened contained in the target frame image based on the similarity matching result comprises the following steps:

And determining an overlapped screening target in targets to be screened corresponding to the target dissimilarity degree and targets to be screened corresponding to the target feature similarity, and determining the targets to be tracked in the target frame image for tracking based on the overlapped screening target.

2. The method according to claim 1, wherein determining the target to be tracked from the targets to be screened included in the target frame image based on the similarity matching result includes:

3. The method of claim 1, wherein the pre-set feature extraction model comprises a shallow convolution module and a deep convolution module; inputting the image corresponding to the external reference detection frame into a preset feature extraction model for feature extraction to obtain a tracking image feature corresponding to a target to be tracked in the reference frame image, wherein the method comprises the following steps:

4. A method according to claim 3, wherein the determining, based on the shallow reference feature vector and the deep reference feature vector, a tracking image feature corresponding to a target to be tracked in the reference frame image includes:

5. The method according to claim 4, wherein the determining whether there is a region overlap in the circumscribed screening detection frame corresponding to each of the objects to be screened in the target frame image includes:

6. The method of claim 1, wherein the similarity matching the object to be tracked with each of the objects to be screened based on the tracking image features, the reference area, the reference aspect ratio, the screening image features, the screening area, and the screening aspect ratio comprises:

7. The method of claim 1, wherein clustering each of the non-tracked objects based on the non-tracked image features to obtain a cluster center comprises:

8. An object tracking device, comprising:

The acquisition unit is used for acquiring reference frame image information, wherein the reference frame image information comprises a target to be tracked, a reference area and a reference length-width ratio of an external reference detection frame, and the target to be tracked is drawn with the external reference detection frame; the reference frame image information also comprises each non-tracking target drawn with an external non-tracking detection frame, and the categories of the non-tracking targets and the targets to be tracked are the same;

the target tracking unit is used for determining a target to be tracked in the targets to be screened contained in the target frame image based on a similarity matching result to track the target to be tracked;

The target tracking unit is used for inputting the images corresponding to the external non-tracking detection frames into the preset feature extraction model to perform feature extraction, so as to obtain non-tracking image features corresponding to the non-tracking targets in the reference frame images; based on the non-tracking image characteristics, clustering each non-tracking target to obtain a clustering center, and determining centroid characteristics corresponding to the clustering center; calculating the degree of dissimilarity between the clustering center and each target to be screened respectively based on the centroid characteristics and the screening image characteristics corresponding to each target to be screened; determining a target dissimilarity degree greater than a preset dissimilarity threshold value in each dissimilarity degree; determining target feature similarity greater than a preset similarity threshold value in the feature similarity between the target to be tracked and each target to be screened; and determining an overlapped screening target in targets to be screened corresponding to the target dissimilarity degree and targets to be screened corresponding to the target feature similarity, and determining the targets to be tracked in the target frame image for tracking based on the overlapped screening target.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method according to any one of claims 1 to 7.