Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the object tracking process is realized, the target tracking object is captured through the camera, for example, a pedestrian is captured through the monocular camera. In the existing object tracking technology, when a camera is used for capturing pedestrians, on one hand, the pedestrians have the characteristics of different postures and various outline appearances, and on the other hand, the environments where the pedestrians are located and the shooting illumination conditions are quite complex. Due to the above reasons, missing detection and false report are easily caused in the capture of pedestrians, so that the accuracy of object tracking is directly reduced. In view of the defect, the core of the embodiment of the present application is to acquire the feature of the detection object in the current image frame and perform feature matching on the feature of the detection object and the target tracking object determined in the previous image frame, thereby acquiring the target tracking object in the current image frame and avoiding missing detection and false report.
Fig. 1 is a schematic flowchart of an object tracking method provided in an embodiment of the present application, and with reference to fig. 1, the method includes:
step 101, marking a bounding box of at least one detected object in the current image frame.
Step 102, respectively calculating the coincidence degree of the bounding box of at least one detection object and the bounding box of the target tracking object in the previous image frame.
And 103, determining the target tracking object from the at least one detection object according to the calculated at least one contact ratio.
In step 101, the current image frame is an image frame taken at the current time, and the current image frame includes at least one detection object. At least one detection object in the current image frame may be a target tracking object, and at least one detection object may not be a target tracking object (meaning that tracking fails).
Optionally, in this embodiment, in the process of implementing object tracking, the image station is shot by a camera so as to perform object capture, for example, the object may be captured by a monocular camera or a monocular camera. The camera may be mounted on a tracking device. Based on this, shooting may be performed by a camera mounted on the tracking apparatus to obtain a current image frame. The implementation of the tracking device may vary depending on the application scenario. For example, when the technical solution of the embodiment of the present application is applied to a robot object tracking scene, the robot may serve as a tracking device, and a monocular or monocular camera may be mounted on a certain part of the robot, for example, a head of the robot to photograph a pedestrian.
After the current image frame is captured, a bounding box of at least one detected object in the current image frame may be marked.
In an alternative embodiment, after the detection object is captured, the bounding box of at least one detection object in the current image frame is marked, which can be implemented by a fast-Region-based temporal Neural network (fast-RCNN) algorithm based on a Neural network. In the existing object tracking technology, when a camera is used for capturing pedestrians, on one hand, the pedestrians have the characteristics of different postures and various outline appearances, and on the other hand, the environments where the pedestrians are located and the shooting illumination conditions are quite complex. Due to the above reasons, missing detection and false report are easily caused in the capture of pedestrians, so that the accuracy of object tracking is directly reduced. And the missing rate and the false positive rate of target detection by the fast-RCNN algorithm are low, so that a foundation is laid for realizing accurate and high-accuracy object tracking. As the fast-RCNN algorithm is mature prior art, the embodiment of the application is not described in detail.
In an alternative embodiment, the current image frame contains at least one detected object that is all objects indicated to be present in the current image frame. By marking the bounding boxes of all the detection objects in the current image frame, the detection missing condition which possibly occurs is favorably avoided.
In another alternative embodiment, before marking the bounding box of at least one detection object in the current image frame, the detection objects in the current image frame may be subjected to a preliminary screening to determine at least one detection object that needs to mark the bounding box, that is, a detection object meeting the marking condition is selected from all the detection objects in the current image frame for marking the bounding box. For example, the detection object with a more regular bounding box may be selected for marking, or the detection object with a larger bounding box (for example, the area of the bounding box is larger than a preset value) may be selected for marking. By adopting the marking mode, the marking speed of the detected object is improved, the data volume required to be processed by object tracking is reduced, the object tracking efficiency is improved, and resources are saved.
For step 102, the previous image frame is: a previous image frame adjacent to a frame number of the current image frame. For example, assuming that the frame number of the current image frame is N, the image frame with the frame number N-1 is the previous image frame.
In the previous image frame, the target tracking object has been determined, i.e. the previous image frame contains the target tracking object and the bounding box of the target tracking object. Optionally, if the previous image frame is the first frame, the detected object with the largest bounding box may be directly selected as the target tracking object, as in the prior art. If the previous image frame is not the first frame, the method provided by the embodiment shown in fig. 1 may be used to determine the target tracking object from the detection objects included in the previous image frame.
After the at least one detection object in the current image frame is marked by the boundary frame, the coincidence degree of the boundary frame of the at least one detection object in the current image frame and the boundary frame of the target tracking object in the previous image frame is respectively calculated, and at least one coincidence degree is obtained at the moment. For example, assuming that the current image frame includes M detection objects, the overlapping degrees of the bounding boxes of the M detection objects and the bounding box of the target tracking object in the previous image frame are calculated respectively to obtain the overlapping degrees of the M bounding boxes. In general, when the shooting angle is not changed between two preceding and succeeding frames of images, the degree of overlap of the bounding boxes included in the two frames of images can be obtained by superimposing the two frames of images.
After obtaining at least one coincidence degree, a target detection object is determined from the at least one detection object based on the at least one coincidence degree, in step 103. It is worth mentioning that, in theory, the target detection object determined in the current image frame should be the same as the target detection object determined in the previous image frame.
In this embodiment, the bounding box of at least one detection object in the current image frame is marked, and the calculation of the coincidence degree between the bounding box of the at least one detection object and the bounding box of the target tracking object in the previous image frame is performed. And determining the target tracking object from the at least one detection object based on the coincidence degree of the boundary box of the at least one detection object in the current image frame and the boundary box of the target tracking object in the previous image frame. The defect that the object tracking is easily influenced by the environment where the object is located is overcome, and more accurate pedestrian tracking is realized.
Fig. 2 is another technical flowchart of an object tracking method provided in an embodiment of the present application, and in conjunction with fig. 2, the method includes:
step 201, marking a bounding box of at least one detected object in the current image frame.
Step 202, for any one of the at least one detection object, respectively calculating the intersection area and the union area of the bounding box of the detection object and the bounding box of the target tracking object in the previous image frame.
Step 203, determining the coincidence degree of the boundary box of the detection object and the boundary box of the target tracking object in the previous image frame according to the area of the intersection and the area of the union.
And 204, judging whether the calculated at least one contact ratio has a contact ratio meeting a preset contact ratio condition.
If there is a coincidence degree meeting a preset coincidence degree condition in the at least one calculated coincidence degree, executing step 205; if there is no coincidence that meets the predetermined coincidence condition in the at least one calculated coincidence, step 206-step 208 are performed.
Step 205, marking the detected object corresponding to the coincidence degree meeting the preset coincidence degree condition as the target tracking object.
Step 206, obtaining a color dense histogram of the at least one detection object.
Step 207, calculating the similarity between the at least one detection object and the target tracking object according to the color dense histogram of the at least one detection object and the color dense histogram of the target tracking object in at least one image frame before the current image frame.
And step 208, acquiring a detection object with the similarity meeting the set requirement from the at least one detection object as the target tracking object.
The specific implementation of step 201 is described in the embodiment shown in fig. 1, and is not described herein again.
For step 202, the current image frame captured by the camera has the same capture angle and capture parameters as the previous image frame. After the current image frame is obtained, the current image frame and the previous image frame are overlapped to calculate the area of the intersection and the area of the union of the boundary frame of at least one detection object in the current image frame and the boundary frame of the target tracking object in the previous image frame, which are generated by overlapping.
For example, the bounding box marking the first detection object in the current image frame is a1, the bounding box marking the second detection object is a2, the bounding box marking the third detection object is A3, and the bounding box of the target tracking object in the previous image frame is B. The following are calculated respectively: the area A1 ≧ B of the intersection of the boundary frame A1 with the boundary frame B and the area A1 ≦ B of the union of the boundary frame A1 with the boundary frame B; the area A2 ≧ B of the intersection of the boundary frame A2 with the boundary frame B and the area A2 ≦ B of the union of the boundary frame A2 with the boundary frame B; the area A3 ≧ B of the intersection of the boundary box A3 with the boundary box B and the area A3 ≦ B of the union of the boundary box A3 with the boundary box B.
For step 203, based on the area of the intersection and the area of the union obtained by the above calculation, the coincidence degree of the at least one detection object in the current image frame and the bounding box of the target tracking object in the previous image frame may be calculated.
In an alternative embodiment, the degree of overlap is the area of the intersection/the area of the union, but is not limited thereto.
Bearing the above example, the first coincidence degree ═ a1 ═ B)/(a1 uberb) of the first detection object in the current image frame with the target tracking object; a second coincidence degree ═ a2 ═ B)/(a2 uber) of the second detection object in the current image frame with the target tracking object; a third coincidence degree ═ A3 ═ B)/(A3 uberb) of the third detection object in the current image frame with the target tracking object.
Alternatively, the overlap ratio may be calculated accordingly, i.e., the overlap ratio is the area of the intersection/(the area of the intersection + the area of the union). Alternatively, the degree of coincidence, i.e., the area of the intersection of the degrees of coincidence K1/the area of the union K2, may be calculated based on the predetermined coefficients K1 and K2.
For step 204, after acquiring the coincidence degree between the bounding box of at least one detection object in the current image frame and the bounding box of the target tracking object in the previous image frame, at least one coincidence degree is obtained. And determining a target tracking object from at least one detection object based on the value of the at least one contact ratio.
Optionally, the preset overlap ratio condition may be an overlap ratio threshold, where the threshold is generally an empirical value, and a value of the threshold is related to a shooting interval of the camera. If the time interval between two frames of images is small when the camera shoots and the moving range of the shot detection object is small, the coincidence degree of the positions of the target tracking object in the two front and back image frames is high. For the above case, the threshold value of coincidence degree may be set relatively small. On the contrary, when the camera shoots, the time interval between two frames of images is large, the moving range of the shot detection object is large, and the coincidence degree of the positions of the target tracking object in the two front and back image frames is low. For the above case, the threshold value of coincidence degree may be set relatively large. In the embodiment, when the shooting frequency of the camera is set to be 1 second for shooting 5 frames of images, the coincidence degree threshold value can be set to be 0.6. Based on the above-mentioned setting of the shooting frequency and the overlap ratio threshold, more accurate detection results of the target tracking object can be obtained by determining that the detection object corresponding to the overlap ratio larger than 0.6 is the target tracking object.
In step 205, in a possible case, when the coincidence degree condition is greater than 0.6, a coincidence degree greater than 0.6 is selected from at least one coincidence degree, and the detection object corresponding to the coincidence degree is taken as the target tracking object.
With respect to step 206, in a possible case, within a shooting time interval of the camera, the detected object moves faster, and there may be no object whose coincidence of the bounding box meets the coincidence condition in the two frames before and after, that is, which detected object in the current image frame is the target detected object based on the coincidence condition. In this case, a color dense histogram of each of the at least one detection object marked in the current image frame is further acquired. For example, if M detection objects are marked in the current image frame, the respective color dense histograms of the M detection objects are calculated, respectively, to obtain M color dense histograms. The color dense histogram represents the color features of the detection object, describes the proportion of different colors in the image of the detection object, and can realize accurate identification of the detection object according to the color dense histogram.
Wherein a color dense histogram may be computed from the image portions within the bounding box.
Optionally, the color dense histogram is calculated by a calcHist function of an Open Source Computer Vision Library CV (Open). Since Open CV is a mature prior art, details of a specific histogram calculation process are not described in the embodiments of the present application.
With respect to step 207, in an alternative embodiment, the similarity between the at least one detection object and the target tracking object is calculated separately, and the similarity between the color dense histogram of the detection object and the color dense histogram of the target tracking object in at least one image frame before the current image frame may be calculated separately. Based on the above calculation process, the similarity of at least one color dense histogram is obtained, and the average value of the similarities of the at least one color dense histogram is calculated as the similarity of the detection object and the target tracking object.
The color dense histogram of the target tracking object in at least one image frame before the current image frame comprises the color dense histogram of the target tracking object in the previous image frame and the corresponding color dense histogram of the target tracking object in a plurality of images before the previous image frame. That is, assuming that the frame number of the current image frame is N and the frame number N-N is the first N image frames of the current image frame, the color dense histogram of the target tracking object in at least one image frame before the current image frame includes: the color dense histogram H-1 of the target tracking object in the N-1 th frame, the color dense histogram H-2 … … of the target tracking object in the N-2 th frame, and the color dense histogram H-N of the target tracking object in the N-N th frame, where N is an empirical value and is an integer equal to or less than N and equal to or greater than 0.
Assume that, in the current image frame, the color dense histogram of the first detection object among the three detection objects of the marker is H1, the color dense histogram of the first detection object is H2, and the color dense histogram of the first detection object is H3. For a first detection object, a specific process of calculating the similarity between the first detection object and the target tracking object is as follows: respectively calculating the similarity of the color dense histogram of the first detection object and the color dense histogram of the target tracking object in n image frames before the current image frame to obtain n color dense histogram similarities; and calculating the average value of the similarity of the n color dense histograms as the similarity of the first detection object and the target tracking object.
For example, the calculation process of the similarity between the first detection object and the target tracking object is as follows:
S1=[similarity(H1,H-1)+similarity(H1,H-2)+……+similarity(H1,H-n)]/n
the calculation process of the similarity between the second detection object and the target tracking object is as follows:
S2=[similarity(H2,H-1)+similarity(H2,H-2)+……+similarity(H2,H-n)]/n
the calculation process of the similarity between the third detection object and the target tracking object is as follows:
S3=[similarity(H3,H-1)+similarity(H3,H-2)+……+similarity(H3,H-n)]/n
wherein similarity () is a similarity calculation function. In the execution mode, the similarity of the detected object in the current image frame and at least one dense historical color histogram of the target tracking object determined in the previous image frame is compared, so that the influence of the posture, the contour appearance, the environment where the pedestrian is located and the shooting illumination condition of the pedestrian on tracking is reduced, and the reliability of the calculation result of the similarity is further improved.
With reference to step 208, in an alternative embodiment, after calculating the similarity between the detection object and the target tracking object, it is determined whether the similarity meets the set requirement one by one. Optionally, the requirements set in this embodiment are: the similarity is greater than a similarity threshold and the similarity is the maximum of the at least one similarity.
The value of the similarity threshold is an empirical value, and is related to the environment where the detection object is located and the lighting condition of the shooting. In the embodiment of the application, during actual operation, the test is carried out in a conventional office area, and when the similarity threshold value is 0.7, a more accurate tracking result can be obtained.
In this embodiment, a bounding box of at least one detection object in a current image frame is obtained, and the bounding box is subjected to overlap ratio calculation with a bounding box of a target tracking object in a previous image frame. And determining the target tracking object from the at least one detection object based on the coincidence degree of the boundary box of the at least one detection object in the current image frame and the boundary box of the target tracking object in the previous image frame. The situations of missing detection and false report are avoided, and the accuracy of object tracking is improved. And when the coincidence degree cannot judge whether the target tracking object exists in the current image frame, further acquiring a color density histogram of at least one detection object in the current image frame, performing similarity calculation with the color density histogram of the target tracking object in at least one image frame before the current image frame, and acquiring the target tracking object from at least one detection object in the current image frame through the similarity of the color density histogram. Based on the scheme, the influence of the posture, the outline appearance, the environment where the pedestrian is located and the shooting illumination condition of the pedestrian on tracking is reduced, and the object tracking with high accuracy is realized.
It should be noted that, in the technical solution of the embodiment of the present application, for each current image frame, a frame number of the current image frame, a boundary frame of the target tracking object in the current image frame, a color dense histogram of the target tracking object in the current image frame, and a corresponding relationship between the target tracking objects are stored.
Optionally, in an actual operation process, when the color dense histogram of the target tracking object in the current image frame is stored, for each image frame, the color dense histogram of the target tracking object in the image frame is stored in a list with a fixed length of n according to the order of the frame numbers of the image frame. And when the number of the color dense histograms stored in the list is less than n, continuously adding the color dense histograms of the target tracking objects in the image frames according to the increasing sequence of the frame numbers of the image frames. If the number stored in the list exceeds N, the color dense histogram of the target tracking object in the image frame with the maximum frame number (the current image frame with the frame number of N) is added at one end of the list, and the color dense histogram of the target tracking object in the image frame with the minimum frame number (the image frame with the frame number of N-N) is deleted at the other end of the list. The list length n is an empirical value, and through a plurality of experimental trials, the most accurate similarity calculation result and higher calculation efficiency can be obtained when n is 10. And storing the content for comparison when the next image frame is used for identifying the target tracking object, and the details are not repeated.
The lower part will be combined with an application scenario to use a specific example to specifically describe the technical solution of the embodiment of the present application.
And the monocular camera is arranged at the head of the intelligent robot and is used for shooting the interactive object and realizing the detection of the interactive object when the robot executes the tracking task of the interactive object.
When the interactive object and the robot perform man-machine interaction, the interactive object is shot through the monocular camera to obtain a first image frame. In the first image frame, a boundingbox of the detected object in the first image frame is obtained by adopting a Faster R-CNN algorithm.
And selecting the detection object corresponding to the bounding box with the largest area as an interactive object, marking the detection object as b-box0, and calculating a color dense histogram of the selected part of the b-box0, and marking the histogram as H0. It should be understood that the interactive object performs human-computer interaction with the robot, the interactive object is closest to the robot, and the rotating direction of the robot eye assembly faces to the direction of the interactive object, so in the shooting result of the monocular camera, the interactive object is located in the middle of the visual field, and the bounding box area is the largest.
After the interactive object is determined through the first image frame, a second image frame is photographed at a photographing angle of the first image frame at a photographing interval of 0.2S. And in the second image frame, acquiring a bounding box of the detected object in the second image frame by adopting a Faster R-CNN algorithm. Assuming that, in the second image frame, the bounding-box corresponding to the four detection objects is respectively obtained: b-box1, b-box2, b-box3 and b-box 4.
Respectively calculating the coincidence degrees C1, C2 and C3 of the bounding box of each detection object in the second image frame and the bounding box of the determined interaction object in the first image frame:
C1=(b-box0∩b-box1)/(b-box0∪b-box1)
C2=(b-box0∩b-box2)/(b-box0∪b-box2)
C3=(b-box0∩b-box3)/(b-box0∪b-box3)
in this embodiment, the overlap ratio threshold is set to 0.6.
In one scenario, assuming that the values of C1, C2, and C3 are 0.2, 0.7, and 0.1, respectively, then C2 >0.7 >0.6, and the detected object corresponding to C2 is tracked as the interactive object.
In another scenario, assuming that the values of C1, C2, C3 are 04, 0.5, and 0.1, respectively, no coincidence value greater than 0.6 is satisfied. Color dense histograms of the four detection objects acquired in the second image frame are calculated respectively, namely color dense histograms of the images selected by the b-box1, the b-box2, the b-box3 and the b-box4 are calculated, and color dense histograms H1, H2, H3 and H4 are obtained respectively.
Calculating the similarity of the color dense histogram H0 of the determined interactive object in the first image frame and the color dense histograms H1, H2, H3 and H4 of the four detected objects in the second image frame respectively:
S1=similarity(H0,H1);S2=similarity(H0,H2);
S3=similarity(H0,H3);S4=similarity(H0,H4);
in this embodiment, the similarity threshold is set to 0.7.
After obtaining the four similarities, the similarity greater than the similarity threshold of 0.7 among S1, S2, S3, S4 is selected. Assuming that the values of S1, S2, S3, and S4 are 0.1, 0.75, 0.1, and 0.05, respectively, and S2 is 0.75>0.7, it is determined that the detection object corresponding to S2 is an interactive object and tracking is performed. The b-box2 corresponding to the interaction object in the second image frame is saved along with the color dense histogram H2.
After the position of the interactive object is acquired in the second frame image, the robot tracks the interactive object: the mobile body is close to the interactive object, and the camera is adjusted again to be aligned with the interactive object. After 0.2S, the camera takes a third image frame.
In the third image frame, the Faster R-CNN algorithm is adopted to acquire bounding boxes of two detected objects in the third image frame, which are marked as b-box21 and b-box 22. And respectively calculating the coincidence degrees of the b-box21 and the b-box22 and the b-box2 of the interaction object determined in the second image frame, respectively judging whether the coincidence degrees of the two coincidence degrees contain the coincidence degree larger than 0.6, and if the coincidence degrees exist, tracking the detection object corresponding to the coincidence degree larger than 0.6 as the interaction object.
Under another situation, if the coincidence degrees calculated by the b-box2 and the b-box21 and the b-box22 respectively do not satisfy more than 0.6, color dense histograms of two detection objects acquired in the third image frame are calculated respectively, namely, color dense histograms of images selected by the b-box21 and the b-box22 are calculated, and H21 and H22 are obtained respectively.
Calculating the average value of the similarity of each of the color dense histogram H0 of the interaction object determined in the first image frame, the color dense histogram H2 of the interaction object determined in the second image frame, and the color dense histograms H21 and H22 of the two detection objects in the third image frame:
S21=[similarity(H0,H21)+similarity(H0,H22)]/2
S22=[similarity(H1,H21)+similarity(H2,H22)]/2;
wherein, calculating similarity () can be realized by the compareHist function of Open CV:
cvCompareHist(const CvHistogram*hist1,const CvHistogram*hist2,intmethod);
method:CV_COMP_COREEL
wherein,
where k is the number of the color dense histogram, d (h)1,h2) Representing a dense histogram h of colors1,h2The similarity between i and j is used for counting, Num is the number of bins of the color dense histogram, and may be set to Num 16 × 4, h in this embodimentk(i) Bin, h' of the k-th color dense histogramk(i) Representing a process of normalization.
In the above statement, cvCompareHist is the comparison function for color dense histograms, CvHistogram is the function used to create multidimensional histograms,hist1and hist2 is the two color dense histograms compared.
After obtaining the two similarities, the similarity greater than the similarity threshold of 0.7 in S21, S22 is selected. Assuming that the values of S1, S2, S3, and S4 are 0.2 and 0.8, respectively, and S22 is 0.8>0.7, it is determined that the detection object corresponding to S22 is an interactive object and tracking is performed. The b-box22 corresponding to the interactive object in the third image frame is saved, along with a color dense histogram H22.
After the position of the interactive object is acquired in the third frame of image, the robot tracks the interactive object: the mobile body is close to the interactive object, and the camera is adjusted again to be aligned with the interactive object. After 0.2S, the camera takes a fourth image frame. The following tracking principle is as described above, and is not described herein again. It should be noted that, when similarity calculation is performed according to the color dense histograms of the preset number of frame images of the interactive object before the frame in the current image, the value of the preset number should be kept in a reasonable range, so as to avoid tracking delay caused by complex calculation.
Fig. 3 is a schematic structural diagram of an apparatus of an object tracking apparatus provided in the present application, and in conjunction with fig. 3, the apparatus includes:
a marking unit 301, configured to mark a bounding box of at least one detected object in a current image frame;
a calculating unit 302, configured to calculate a degree of coincidence between a bounding box of at least one detected object and a bounding box of a target tracking object in a previous image frame, respectively;
a determining unit 303, configured to determine the target tracking object from the at least one detection object according to the calculated at least one coincidence degree.
Further optionally, the calculating unit 302 is configured to: for any detection object in the at least one detection object, calculating the area of the intersection and the area of the union of the bounding box of the detection object and the bounding box of the target tracking object in the previous image frame; and determining the coincidence degree of the detection object and a boundary frame of the target tracking object in the previous image frame according to the area of the intersection and the area of the union.
Further optionally, the calculating unit 302 is configured to: and if the calculated at least one contact ratio has a contact ratio meeting a preset contact ratio condition, marking the detection object corresponding to the contact ratio meeting the preset contact ratio condition as the target tracking object. Further optionally, the determining unit 303 is configured to: if the coincidence degree which accords with the coincidence degree range does not exist in the at least one calculated coincidence degree, obtaining a dense color histogram of each detection object; calculating the similarity of the at least one detection object and the target tracking object according to the color dense histogram of the at least one detection object and the at least one historical color dense histogram of the target tracking object; and acquiring a detection object with the similarity meeting the set requirement as the object of the target tracking object.
Further optionally, the calculating unit 302 is configured to: if the calculated at least one contact ratio does not have a contact ratio meeting a preset contact ratio condition, acquiring a dense color histogram of the at least one detection object; respectively calculating the similarity of the at least one detection object and the target tracking object according to the color dense histogram of the at least one detection object and the color dense histogram of the target tracking object in at least one image frame before the current image frame; and acquiring a detection object with the similarity meeting the set requirement from the at least one detection object as the target tracking object.
Further optionally, the calculating unit 302 is configured to: respectively calculating the similarity of the color dense histogram of the detection object and the color dense histogram of the target tracking object in at least one image frame before the current image frame to obtain the similarity of at least one color dense histogram;
and acquiring an average value of the similarity of the at least one color dense histogram as the similarity of the detection object and the target tracking object.
Further optionally, the object tracking apparatus provided in the embodiment of the present application further includes a storage unit. The storage unit is used for storing the corresponding relation among the frame number of the current image frame, the boundary box of the target tracking object in the current image frame and the detection object defined by the boundary box of the target tracking object in the current image frame.
In the object tracking apparatus provided in this embodiment of the present application, the marking unit 301 marks a bounding box of at least one detected object in the current image frame, the calculating unit 302 calculates a degree of coincidence between the bounding box of the at least one detected object in the current image frame and a bounding box of a target tracking object in a previous image frame, and the determining unit 303 determines the target tracking object from the at least one detected object according to the calculated at least one degree of coincidence. The defect that the object tracking is easily influenced by the environment where the object is located is overcome, and more accurate pedestrian tracking is realized.