CN103679742B

CN103679742B - Method for tracing object and device

Info

Publication number: CN103679742B
Application number: CN201210327643.8A
Authority: CN
Inventors: 范圣印; 王鑫; 李滔
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2012-09-06
Filing date: 2012-09-06
Publication date: 2016-08-03
Anticipated expiration: 2032-09-06
Also published as: CN103679742A

Abstract

Provide a kind of method and apparatus utilizing depth image or anaglyph to follow the tracks of object.The method includes: for the first object being newly detected, and utilizes the range image sequence including the first object, extract the short-time characteristic of the first object and long time feature, and be stored as the short-time characteristic template of the first object and long time feature templates；During the tracking to the first object, for every frame image to be analyzed, it is tracked with the mating of short-time characteristic template of the first object of storage based on the short-time characteristic of candidate target extracted；If based on the short-time characteristic template of the first object, it fails to match, is the first missing object by the first object identity；For follow-up the second object being newly detected, extract the short-time characteristic of the second object and long time feature；And short-time characteristic of based on the second object and long time feature and be identified as the short-time characteristic template of the first missing object and long time feature templates between coupling, it is judged that whether the second object is the first missing object.

Description

Object tracking method and device

Technical Field

The present invention relates generally to object tracking methods and apparatus, and more particularly to methods and apparatus for tracking one or more objects using depth images.

Background

Object tracking technologies such as human tracking technology are key technologies required for applications such as human positioning services, video surveillance, robot navigation, intelligent driving assistance systems, and the like. At present, the method still has a challenging technical subject, mainly because the posture and the appearance of people in practical application are changed greatly and the problem of shielding often occurs. In real scenes, occlusion problems occur, especially when people have motion, or when other people or objects have motion. In indoor scenes, due to the complex background, numerous people, and the higher frequency of occlusion occurrences, it often results in a failure of tracking.

The patent document US88073197B2 proposes a method for object tracking in a video sequence. The method uses an appearance model in order to identify different objects when occlusion occurs. When occlusion occurs, each object has its own appearance model. Depth information is used in the model to separate different objects in the same region. The method proposed in this patent mainly uses a gray scale shape model and depth order information of objects to roughly separate different objects. The method works relatively efficiently with small range occlusions, which may fail when the occlusions are more severe.

Patent document US6445810B2 proposes a method and apparatus for human detection and tracking. It uses 3D data and apparent characteristics of a person (color, face characteristics, height of a person, etc.), which considers that 3D data is very useful for segmentation and tracking of a person, and the use of stereoscopic vision-based 3D data for tracking can effectively improve the accuracy of tracking. The method proposed in this patent uses an RGB-based appearance model of the person, uses depth-based segmentation of the person, and uses both together for person tracking. The goal of this technique is more accurate tracking, but the problem of occlusion is not considered.

In a publication published by michael harville in 2003 entitled "stereos training with adaptive panel-viewtemptres of human and clinical statistics", a method for performing feature statistics on a certain plan view and detecting and tracking a person based on the statistical features is disclosed. The tracking method uses statistical features of the distribution of the surface points of the person on the level view and kalman filtering. The method based on the horizontal plane view (coordinate transformation from 3-dimension to 2-dimension, histogram statistics, i.e. calculating features on the plane view) proposed by the non-patent document can reduce the influence of occlusion to some extent, and can be used in scenes with simpler backgrounds such as living rooms and halls, but in complex office scenes with many partitions, for example, the problem of tracking failure can occur.

Disclosure of Invention

The present invention has been made in view of the above circumstances.

It is an object of the invention to enable at least some degree of recovery of tracking of objects that are missing for long periods of time.

It is another object of the invention to minimize the impact of occlusion on object tracking.

It should be noted that the technical solution of the present invention does not need to achieve all of the above-mentioned objectives, but only needs to achieve one of them.

According to an aspect of the present invention, there is provided a method of tracking one or more objects using a depth image or a parallax image, including: for a newly detected first object, extracting short-time features and long-time features of the first object by using a depth image sequence comprising the first object, and storing the short-time features and the long-time features as a short-time feature template and a long-time feature template of the first object, wherein the short-time features of the object represent features of a depicting object with strong time sensitivity, and the long-time features of the object represent statistical distribution of all or part of the short-time features of the object along with time; in the tracking process of the first object, for each frame of image to be analyzed, tracking based on the matching of the short-time features of the extracted candidate object and the stored short-time feature template of the first object; identifying the first object as a missing first object if the matching based on the short-time feature template of the first object fails during the tracking of the first object; for a subsequent newly detected second object, extracting short-term features and long-term features of the second object by using a depth image sequence comprising the second object; and determining whether the second object is the missing first object based on a match between the short-term and long-term features of the second object and the short-term and long-term feature templates of the first object identified as missing.

According to the method of the above-described embodiment of the present invention, for an object found to be lost in tracking, particularly long-term missing, it is possible to retrieve the missing object to restore tracking by matching the short-term features and long-term features of the newly detected object with the short-term feature template and long-term feature template of the object identified as missing.

Furthermore, the matching process based on the extracted short-term features of the candidate object and the stored short-term feature template of the first object in the above embodiment may include: determining the confidence degree of matching the short-term features of the candidate object and the short-term feature template of the first object; if the confidence is higher than a predetermined threshold, determining that the two match; if the confidence is lower than a preset threshold value, judging whether the low confidence is caused by occlusion or not based on a preset rule, if so, determining that the two are matched, otherwise, determining that the two are not matched.

With the above technical features, the object missing due to occlusion can be reduced as much as possible.

According to another aspect of the present invention, there is provided an apparatus for tracking one or more objects using a depth image or a parallax image, including: a first object template determination section that extracts, for a newly detected first object, short-time features and long-time features of the first object using a depth image sequence including the first object, and stores as short-time feature templates and long-time feature templates of the first object, wherein the short-time features of the object represent features of a characterization object that is highly time-sensitive, and the long-time features of the object represent statistical distribution over time of all or part of the short-time features of the object; a first object tracking section that tracks, for each frame of an image to be analyzed, based on matching of the extracted short-time feature of the candidate object with a stored short-time feature template of the first object in a tracking process of the first object; a missing object identification section that identifies the first object as a missing first object if matching based on the short-time feature template of the first object fails in the tracking of the first object; a second object feature extraction section that, for a subsequent newly detected second object, extracts short-term features and long-term features of the second object using a depth image sequence including the second object; and a lost object recovery section that determines whether the second object is the missing first object based on a match between the short-time feature and the long-time feature of the second object and the short-time feature template and the long-time feature template of the first object identified as missing.

According to the apparatus of the above-described embodiment of the present invention, for an object whose tracking is lost, particularly, which is lost for a long time, it is possible to retrieve the lost object to restore the tracking by matching the short-time feature and the long-time feature of the newly detected object with the short-time feature template and the long-time feature template of the object identified as lost.

Drawings

Fig. 1 shows an overall flowchart of an object tracking method 100 using a depth image or a parallax image according to a first embodiment of the present invention.

Fig. 2 shows a flowchart of a matching process based on the extracted short-term features of the candidate object and the stored short-term feature template of the first object according to an embodiment of the present invention.

Fig. 3 shows a schematic diagram of decision of a decision tree based on environmental and human motion according to an embodiment of the invention, where two scenarios are shown.

Fig. 4 shows a schematic diagram of decision tree determination based on motion of a person and persons in its vicinity according to an embodiment of the invention, wherein two scenarios are shown.

FIG. 5 illustrates a flow diagram of a method for recovering tracking of long term missing objects based on matching of a hybrid short term feature template and a long term feature template in accordance with an embodiment of the present invention.

Fig. 6 is a functional configuration block diagram of an object tracking apparatus according to an embodiment of the present invention.

FIG. 7 is an overall hardware block diagram illustrating an object tracking system according to an embodiment of the invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, the following detailed description of the invention is provided in conjunction with the accompanying drawings and the detailed description of the invention.

The description will be made in the following order:

1. summary of the invention

2. First embodiment

2.1 Overall flow diagram of an example object tracking method

2.2 matching processing based on short-term features

2.3 recovery of long-term lost object tracking based on matching of hybrid short-term and long-term feature templates

3. Second embodiment

3.1 calculation of weights for short-term features

3.2 calculation of weights for Long-term features

4. Object tracking apparatus

5. System hardware configuration

6. Summary of the invention

1. Summary of the invention

The inventor summarizes that: in fact, occlusion can cause many types of tracking failures, the main reason being that when occlusion occurs, the features used for tracking also typically change abruptly or significantly.

The inventors believe that tracking failures can be roughly classified into two categories: short term loss and long term loss.

A short loss is one in which the tracked object can be retraced after a short loss. For example, in tracking a person based on a depth image, when the walking person is occluded by an object that is closer to a camera, the shape feature and the distance feature of the walking person may be largely changed, and at this time, a tracking failure is likely to occur for the walking person. In a short period of time, the occlusion disappears, the shape and distance characteristics of the person recover, and the person is likely to be recovered, i.e. continue to track.

The long loss is for the following cases: tracked objects cannot be found for a long period of time after loss. For example, a person disappears completely due to a complex or relatively high background. After a longer period of time, the person is again present in the immediate vicinity. As will be appreciated by those skilled in the art, due to the long time lost, the person, although detected, may not be restored to a previously tracked person, but rather appear with a new detected person's identity.

The inventors contemplate that based on short-term and long-term loss cases, it is possible to consider using features of different characteristics, referred to as short-term and long-term features, where the short-term features of an object represent features characterizing the object with strong time sensitivity, and the long-term features of the object represent a statistical distribution over time of all or part of the short-term features of the object. When a long-term loss condition occurs, for a subsequent newly detected object, it is determined whether the newly detected object is actually a previously tracked lost object based on the extracted short-term features and long-term features and a match between the short-term feature template and the long-term feature template of the found lost object.

Specific embodiments for practicing the inventive concepts are described in detail below.

2. First embodiment

2.1 Overall flow diagram of an example object tracking method

Embodiments of the present invention assume that a depth image or a parallax image series can be obtained in real time. The sequence of depth images or parallax images may be calculated from images taken by a locally arranged stereo camera or may be obtained remotely, e.g. over a wireless or wired network.

The depth map or disparity map may be acquired using a determined binocular camera based on the principle of binocular range finding. As is well known to those skilled in the art, the depth map and the disparity map may be mutually inverted. Any existing method of acquiring a depth map may be used with the present invention. For example, a depth map including the object portion may be photographed and calculated by a binocular camera, a multi-view camera, or a stereo camera. Specifically, for example, a left image and a right image may be taken by a binocular camera, a disparity map is calculated based on the left image and the right image, and a depth map is obtained from the disparity map. The above-mentioned cameras are only examples, and the depth map or the disparity map may also be obtained based on other types of stereo cameras, for example, infrared light may be actively emitted to assist in generating stereo information, such as microsoft Kinect, a camera based on the infrared light time of flight (TOF) technology, or a camera based on texture light.

For convenience of description, the following description assumes that one or more stereo cameras are arranged for a predetermined space to stereoscopically image an object appearing in the predetermined space and obtain a depth image or a parallax image by conversion or calculation or the like.

The predetermined space may be, for example, a single room such as an office, a supermarket, a factory building, or may be an outdoor space such as a school yard or a military place as long as the space can be a monitoring target. The object to be tracked is not particularly limited, and may be a human, an animal, a flying object, or any other movable object.

As shown in fig. 1, for a newly detected first object, short-term features and long-term features of the first object are extracted using a depth image sequence including the first object, and stored as a short-term feature template and a long-term feature template of the first object in step S110.

As for the method of detecting the object, any method may be employed. For example, HOG features based on RGB images may be used for detection, connected domain analysis based on depth images may be used to segment and detect objects, and so on.

As is well known in the art of tracking, typically at the beginning of tracking, all detected objects are considered as newly detected objects. If a new object is detected, the motion characteristics of the object can be analyzed by image processing techniques such as motion estimation, and the position of the object in the next frame and the like can be predicted, so that the tracking object can be searched near the predicted position for the next frame, thereby realizing the tracking of the object. In the case where the tracking fails or every predetermined period of time, the object detection processing may be started again, the object detected except the tracked object may be regarded as a newly detected object, and the above-described tracking processing may be started for the newly detected object.

Here, the short-term features of the object represent features of the characterization object with strong time sensitivity, and the long-term features of the object represent statistical distribution of all or part of the short-term features of the object over time.

In particular, for example, the temporal characteristics of the object may include one or more of the following: the position, height and shape characteristics of the object extracted from the single-frame depth image, and the motion direction and motion amplitude of the object calculated through the multi-frame depth image.

The position (x, y, z), height, and shape features of the object can be extracted directly from the depth image. In the depth image, each person includes a plurality of pixel points. Each pixel point has position information (xi, yi, zi), and the position information of the pixel points contained in a person is counted to obtain the average value of the position information, so that the position characteristics (x, y, z) of the person can be obtained. The height characteristics of the person can be obtained by counting the maximum y value and the minimum y value of the pixel points contained in the person and solving the absolute value of the difference between the maximum y value and the minimum y value. By calculating the human contour, a chain code (chaincode) or a fourier descriptor (fourier descriptors) of the contour can be calculated, and the shape characteristics of the human can be obtained. Relating to chain code and Fourier delineationThe description of the shape characteristics of the seeds may be found in an article entitled "Ashrveyffhapefafeturextraction techniques", Pattern recognition, Peng-YengYin (Ed.) (2008)43-90, which may be accessed from the websitehttp://hal-supelec.archives-ouvertes.fr/docs/00/44/60/37/PDF/ARS-Journal-SurveyPatternRecognition.pdfAnd downloading to obtain the target product.

Through the multiple frames of depth images, the motion information of the person can be calculated, and the motion information comprises the motion direction and the motion amplitude. Any method that can calculate a moving object may be used here, for example, an optical flow method, an SSD method, or the like.

After the above features are extracted, they may be combined into a feature Vector _ S = { (x, y, z), height, (chain _ code), motion _ magnitude, motion _ direction }, for example, and the feature Vector may be directly used for tracking people.

The various short-term features and feature vectors described above are examples only. Any feature that can characterize the object's characteristics in the current frame or feature can be extracted as a short-term feature, and one or a combination of several of the above short-term features can be employed to form a short-term feature vector.

With respect to long-term features of an object, it can be characterized by counting the probability distribution of some aspect of the object's characteristics.

As described above, the present invention contemplates applying the long-term feature because, after the long-term loss occurs, when the object reappears, the short-term feature such as the position, height, shape (posture) and the like of the object may be largely changed from before and thus is no longer applicable, and the statistical probability distribution of the short-term feature of the object is relatively stable and thus can be used as the long-term feature. For example, in an office scenario, a worker walks into his work space (within a partition), sits down, and starts working. The staff disappeared due to the shielding of the partition. After a longer period of time, it stands up and the staff is present again. This is a typical long-term loss problem, and in order to consider measures against a long-term loss, first, the conditions of the features before and after the worker disappears are analyzed. Before disappearance, the staff member is characterized by position a, shape B and height C; after a period of time, it appears, now characterized by position D, shape E and height D. In some cases, because the same person is in a single workstation, the location features may be adjacent before disappearing and after reappearing; the shape and height characteristics may vary slightly, and may also vary somewhat, depending on the pose of the worker, etc. In other cases, the two times have a relatively large time interval before disappearance and after appearance, during which the position, movement, posture, etc. of the worker changes. Thus, much uncertainty can occur. In this case, the short-term feature would not be useful to cope with the long-term loss problem.

In this case, it is conceivable to use a probability distribution based on a certain characteristic of the object of the multi-frame depth image as a long-term feature to cope with such uncertainty.

For example, the statistical distribution of all or part of the short-term features of the object over time may be characterized by a gaussian distribution as the long-term features of the object.

Specifically, for example, for each short-term feature, the maximum value, the minimum value, the mean value, and the variance of the short-term feature in the multi-frame image are counted, and a gaussian distribution of the short-term feature is established based on the statistics, and as the long-term feature of the object corresponding to the short-term feature, for example, formula (1) and formula (2) may be referred to specifically;

μ = \overset{&OverBar;}{x} . . . (1)

σ^{2} = \frac{1}{n} Σ_{i = 1}^{n} {(X_{i} - \overset{&OverBar;}{X})}^{2} . . . (2)

in the following description, the height feature of a person is taken as an example, and at this time, the variable X represents the height of the person, formula (1) gives an average value of the heights of the person acquired in the multi-frame depth image, and formula (2) gives a variance of the heights. Where n denotes the number of frames of the depth image used, Xi is the height of the person captured in the ith frame depth image,is the sample mean of Xi, and is also the mathematical expectation of μ.

The long-term features of the object are described above by taking the gaussian distribution as an example of the short-term features, but the long-term features are not limited to being based on the gaussian distribution, and other types of probability distributions, such as uniform distribution, exponential distribution, gamma distribution, etc., may be adopted as needed and practical.

After the short-term features and long-term features of the object are extracted, they may be stored as short-term feature templates and long-term feature templates of the object.

The image frames are changed continuously along with the progress of time, and the short-term feature template and the long-term feature template of the object can be updated according to the requirement. For example, the short-term feature or the long-term feature may be updated every frame or according to a predetermined time interval, and the short-term feature template and the long-term feature template of the object may be updated at the same time interval or at different time intervals. Alternatively, because in this example the long-term feature template is applied to recover a lost person only when it is found that the object tracking is lost, the long-term feature template of the object may be updated when it is found that the object tracking is lost. Preferably, the short-term feature templates are updated more often, since they are generally used more frequently.

In step S120, in the tracking process of the first object, for each frame of the image to be analyzed, tracking is performed based on matching of the extracted short-time features of the candidate object with the stored short-time feature template of the first object.

Specifically, for example, a region where the first object is located in the current frame may be estimated from motion information of the object, and then in the estimated region, the candidate object and the short-term features of the candidate object may be extracted according to a predetermined method, and tracking of the first object may be performed by matching the short-term features of the respective candidate objects with a stored short-term feature template of the first object, for example, calculating a difference between the two and then comparing the difference with a threshold to determine whether the candidate object is the first object.

Of course, the determination of whether there is a match may also be based on other methods of estimating the confidence of the match between the candidate object and the first object.

In conventional processing, for matches with large differences or low confidence, processing is performed as missing.

According to one embodiment of the invention, further analysis may be performed on low confidence matches. Given the complexity of the background and the motion of the person, the low confidence is likely due to occlusion. If it is due to a low confidence in the occlusion-induced match that it is reasonably in the right, the match should be judged as successful. The matching process based on the short-term characteristics according to an embodiment of the present invention will be described in detail below with reference to fig. 2 to 4.

In step S130, if the matching based on the short-time feature template of the first object fails in the tracking of the first object, the first object is identified as a missing first object.

For example, if in one or more frames of images, no candidate object matching the first object can be found within the tracking area, the first object is considered to be lost for tracking, thereby identifying the first object as a missing first object.

Optionally, but not necessarily, upon recognizing that the first object is missing, the stored short-term feature template and long-term feature template of the first object may be updated with the short-term features and long-term features of the nearest first object.

In step S140, for a subsequent newly detected second object, short-term features and long-term features of the second object are extracted using a depth image sequence including the second object.

For a newly detected object, its short-term features and long-term features can be extracted using methods similar to those described above. A short-term feature template and a long-term feature template may also be stored for the newly detected object.

In step S150, it is determined whether the second object is the missing first object based on a match between the short-term and long-term features of the second object and the short-term and long-term feature templates of the first object identified as missing.

A detailed implementation of step 150 according to an embodiment of the present invention will be described in detail with reference to fig. 5.

The tracked object, such as a person, may be output to a display device for display, or may be output to a memory, hard disk, or remote device for subsequent processing as desired.

According to the above-described embodiments of the present invention, for an object found lost of tracking, particularly long-term missing, it is possible to retrieve the missing object to restore tracking by matching the short-term features and long-term features of the newly detected object with the short-term feature templates and long-term feature templates of the object identified as missing.

2.2 matching processing based on short-term features

The matching process based on the extracted short-term features of the candidate object and the stored short-term feature template of the first object according to the embodiment of the present invention is described below with reference to fig. 2.

Fig. 2 shows a flow diagram of a matching process 120 based on the extracted short-term features of the candidate object and the stored short-term feature template of the first object, according to an embodiment of the invention. This matching process may be applied to step S120 in fig. 1.

As shown in fig. 2, in step S121, the confidence of the matching of the short-term features of the candidate object and the short-term feature template of the first object is determined.

Specifically, for example, the sum of absolute differences between the short-term features of the candidate object and the short-term feature template of the first object may be calculated, and normalized to a range between 0 and 1 (for example, by dividing by a predetermined constant) as the confidence. However, this is merely an example, and any method of determining the degree or probability of matching the short-term features of the candidate object with the short-term feature template of the first object may be used in the present invention.

In step S122, it is determined whether the confidence is higher than a threshold.

If the confidence is higher than the threshold, the process proceeds to step S124, and the matching is considered to be successful, i.e., the first object is tracked.

On the contrary, if the confidence is lower than the threshold, the flow proceeds to step S123, and the type of feature difference that results in low confidence is determined. Specifically, for example, differences in shape characteristics, differences in height characteristics, and differences in motion characteristics are calculated, and it is determined which characteristic has a large difference, and it is highly likely that the most different characteristic results in low confidence in matching. And then proceeds to step S125.

In step S125, it is determined whether the low confidence is caused by occlusion and/or motion based on a predetermined rule, and then it is determined whether there is a match.

Fig. 3 shows a schematic of decision of a decision tree based on the environment and the motion of a person according to an embodiment of the invention, showing two scenarios.

The first scenario (represented by the left branch of the decision tree) is as follows: if the shape feature of the candidate changes and the shape feature of the lower half changes, but the height feature does not change much, and the candidate is moving horizontally, we consider that the matching confidence value of the candidate is low, but it is considered that the matching is successful.

The reason is that due to the horizontal motion of the candidate and the complex background, the lower half of the candidate is occluded, resulting in a large change in the shape characteristics thereof, which can be determined as a successful matching.

The second scenario (represented by the right branch of the decision tree) is as follows: if the shape feature of the candidate is changed greatly, the height feature of the candidate is changed to be reduced, and the candidate is in vertical motion, we consider that the matching confidence value of the candidate is low, but the candidate is also considered to be successfully matched.

The reason is that, in this case, due to the vertical movement (sitting down, bending down) of the object candidate, the shape feature and the height feature of the object candidate are largely changed, which can be determined as a successful matching.

Fig. 4 shows a schematic diagram of decision tree determination based on motion of a person and persons in its vicinity according to an embodiment of the invention, wherein two scenarios are shown. The meaning of the feature of the mark with the background of the lattice in the figure means that the feature belongs to the feature of the adjacent candidate and is not the feature of the current candidate, and the figure also shows two scenarios.

The first scenario (represented by the left branch of the decision tree) is as follows: if the shape feature of the candidate is changed greatly, the candidate does not move, and the neighboring candidate moves horizontally, the matching confidence value of the candidate is low, but the candidate is considered as being matched successfully.

The reason is that the candidate is occluded due to the horizontal movement of its neighboring candidate, so that its shape feature is changed relatively greatly, which can be determined as a successful match.

The second scenario (represented by the right branch of the decision tree) is as follows: if the shape feature of the object candidate is changed greatly, the object candidate is moving horizontally, and the neighboring object candidates are also moving horizontally, the object candidate is considered to be successfully matched although the matching confidence value of the object candidate is low.

The reason is that the candidate is occluded due to the horizontal movement of the candidate and its neighboring candidates, so that the shape feature of the candidate is changed relatively greatly, which can be determined as a successful matching.

Fig. 3 and 4 of the present invention are only specific analysis examples based on decision trees for the case of low confidence of short-term feature matching, and different rules can be designed and processed according to needs and specific situations.

If it is determined in step S125 that the matching fails, it proceeds to step S126, otherwise it proceeds to step S124 if the matching succeeds.

The method of the decision tree based on the rule is adopted to deal with the matching of the specific tracking with low confidence coefficient, and the problem of short-time failure of the tracking caused by occlusion can be effectively reduced.

How to recover tracking of long-term missing objects based on matching of mixed short-term and long-term feature templates according to an embodiment of the present invention is described below with reference to fig. 5.

FIG. 5 illustrates a flow diagram of a method 150 for recovering tracking of long term missing objects based on matching of a hybrid short term feature template and a long term feature template in accordance with an embodiment of the present invention. The method 150 may be used in step S150 shown in fig. 1.

As shown in fig. 5, in step S151, the short-term features of the second object are matched with the short-term feature template of the first object.

As the matching method, for example, the matching method described with reference to fig. 2 to 4 is employed. Of course, any other method of matching features to feature templates is possible.

In step S152, it is determined whether there is a match, and if there is a match, the process proceeds to step S153, i.e. after the first object tracking is lost, the tracking of the first object is restored by matching the newly detected short-time features of the second object and the first object. If not, the process proceeds to step S154.

In step S154, the long-term features of the second object are matched with the long-term feature template of the first object.

Specifically, in the case of having a gaussian distribution of one or more characteristics of the object as one or more long-term features, the statistical similarity of the gaussian distribution of the long-term features as the second object and the gaussian distribution of the long-term feature template as the first object may be calculated by confidence interval estimation, and if the statistical similarity of the two gaussian distributions is greater than a predetermined threshold, the second object and the first object are considered to be matched. Otherwise, the second object and the first object are considered to be mismatched.

Thus, the matching judgment of the two types of long-term features is converted into the judgment of the matching of the two types of Gaussian distributions. The first gaussian distribution is from a long-term missing person, i.e. a first object, which is stored in the repository as a long-term tracking model; the second gaussian distribution is from people that are currently newly detected but do not pass the short-term match.

The matching based on the long-term features is to judge whether the newly detected person is a lost person for a long time or not through the similarity of two types of Gaussian distributions.

The above operation is based on the following considerations: given the measurements of the same person in the same scene, the parameter "variance" of the two gaussian distributions, the first gaussian distribution and the second gaussian distribution, should be statistically similar.

As an example, the confidence interval may be calculated by F distribution according to the following formula (3),

(\frac{S_{1}^{2}}{S_{2}^{2}} \frac{1}{F_{α / 2} (n_{1} - 1, n_{2} - 1)}, \frac{S_{1}^{2}}{S_{2}^{2}} \frac{1}{F_{1 - α / 2} (n_{1} - 1, n_{2} - 1)}) . . . (3)

wherein,

\frac{S_{1}^{2}}{S_{2}^{2}} \frac{1}{F_{α / 2} (n_{1} - 1, n_{2} - 1)}

is the lower confidence limit of the confidence interval,

\frac{S_{1}^{2}}{S_{2}^{2}} \frac{1}{F_{1 - α / 2} (n_{1} - 1, n_{2} - 1)}

is the lower confidence limit of the confidence interval, F (n)₁-1,n₂-1) is a F distribution, n₁And n₂Respectively the number of samples of the first set of samples and the number of samples of the second set of samples used in estimating the first gaussian distribution and the second gaussian distribution parameters,is the standard deviation of the first set of samples,is the standard deviation of the second sample set and α indicates a statistical confidence level of 1- α.

As an example, determining whether the first gaussian distribution and the second gaussian distribution are similar based on the statistical similarity may include: determining that the two Gaussian distributions are statistically similar if the confidence interval includes 1.0; if the confidence interval does not include 1.0, the two Gaussian distributions are determined to be statistically dissimilar. For example, the confidence interval is (0.7, 2), including 1.0, the first gaussian distribution and the second gaussian distribution are determined to be statistically similar, and the newly detected candidate is a lost person for a long time; the confidence interval is (0.6, 0.9), excluding 1.0, and the first Gaussian distribution and the second Gaussian distribution are determined to be not statistically similar, and the newly detected candidate is not a lost person for a long time; the confidence interval is (1.5, 4), excluding 1.0, and the first gaussian distribution and the second gaussian distribution are determined to be statistically dissimilar, and the newly detected candidate is a long-term missing person.

The use of an F-distribution to calculate the confidence intervals that the first and second gaussian distributions are statistically similar is described in the above example. However, this is merely an example, and any means capable of determining whether two distributions are similar may be used in the present invention.

After step S154, the process proceeds to step S155, where it is determined whether the second object and the first object match, and if so, the process proceeds to step S153, where the tracking of the first object is resumed, and if not, the process proceeds to step S156, where it is determined that the second object is not the first object.

The above process 150 then ends.

3. Second embodiment

3.1 calculation of weights for short-term features

The inventors contemplate that over time and as circumstances change, the status or role of different features in the match may change, and it may be preferable to weight the features. For example, the difference between the feature of the candidate object and the feature in the neighboring area is dynamically changed, and the degree of change of the difference is different for different features. The shape characteristics are taken as an example for analysis: if a tracked human subject stands in front of a white wall, the shape characteristics of the human subject are greatly different from those of the wall, and the shape characteristics at the moment can be considered to have a large significance value, so that the shape characteristics are very effective tracking characteristics; if the person is standing before a portrait of the person, the shape features of the person are not effective tracking features at the moment, because the shape features of the portrait adjacent to the person to be tracked are very similar to those of the person. By analogy, similar situations can occur for height characteristics, motion characteristics, and the like.

The basic idea of the processing in the embodiment of the present invention is: the significance of the features is dynamically variable depending on the conditions of the adjacent scene of different tracked objects. Therefore, the characteristics of the tracked object and the characteristics of the area adjacent to the tracked object need to be calculated; calculating the difference between the tracked object and the adjacent area of the feature; and uniformly determining the weight of the characteristic used for tracking according to the difference values of the plurality of characteristics.

Thus, the weight adjustment of the short-term features may be performed as follows according to one embodiment of the present invention.

The weights of the short-term features are determined based on the differences between the tracking object and the vicinity of the tracking object at the current time (or the current frame) with respect to the short-term features, such that if a certain short-term feature has a large difference between the tracking object and the vicinity of the tracking object, the weight of the short-term feature is large, whereas if the difference is small, the weight of the short-term feature is small.

For example, assuming that Shape _ Diff is a difference between a Shape feature of a tracked person and a Shape feature of its vicinity, a difference between a Motion feature of a person tracked by Motion _ Diff and a Motion feature of its vicinity, and a difference between a Height feature of a person tracked by Height _ Differenceis and a Height feature of its vicinity, a weight (weight) of the Shape feature may be calculated according to the following formula (4).

Weight (shape) = \frac{Shape_Diff}{Shape_Diff + Motion_Diff + Height_Diff} . . . (4)

Based on a similar formula, weights for features other than shape can be calculated.

After the weights of the short-term features are calculated, the weights of the short-term features can be stored in association with the short-term features as a short-term feature template.

After the weight of the short-term feature is thus determined, at the time of object tracking with respect to the next frame, matching between the short-term feature of the tracked object and the short-term feature template of the relevant object may be performed based on the newly determined weight of the short-term feature.

According to the above-described embodiments, the weight of the short-term features is dynamically adjusted based on the degree of saliency of each short-term feature of the object with respect to the vicinity, so that object tracking can be performed more accurately.

3.2 calculation of weights for Long-term features

The inventors thought that when a probability distribution of temporal changes of short-term features is adopted as the long-term features, it is considered that long-term features corresponding to short-term features that are relatively stable and less strongly changed are more suitable for recovery of objects lost for a long time, and therefore should be given higher weight.

To this end, according to one embodiment of the invention, the weights of the long-term features are determined using the following method.

The weight of each long-term feature, which is a statistical distribution of each short-term feature, is determined according to the degree of stability of each short-term feature in the case where a specific object is lost a plurality of times at a certain position.

In particular, the identification of the particular object is stored in association with the location at which tracking was lost, the number of times that tracking was lost at that location.

If the number of times the specific object is lost at the determined position is greater than a predetermined threshold, the mean and variance of each short-term feature when the object is lost a plurality of times at the determined position are counted, thereby determining the degree of stability of each short-term feature.

A high weight is given to the long-term features, which are statistical distributions of short-term features having high degrees of stability.

After the weight of the long-term feature of the specific object is thus determined, the weight of the long-term feature may be stored in association with the long-term feature template. When the specific object is lost, matching between the long-term features of the newly detected object and the long-term feature template of the lost object can be performed based on the weight of the long-term features. According to the embodiments, object tracking and recovery of tracking loss can be performed more accurately.

4. Object tracking apparatus

Fig. 6 is a functional configuration block diagram of an object tracking apparatus 6000 according to one embodiment of the present invention.

As shown in fig. 6, the object tracking apparatus 6000 may include: a first object template determination section 6100 that extracts, for a newly detected first object, short-term features and long-term features of the first object using a depth image sequence including the first object, and stores as short-term feature templates and long-term feature templates of the first object, wherein the short-term features of the object represent features of a characterization object that are highly time-sensitive, and the long-term features of the object represent statistical distribution over time of all or part of the short-term features of the object; a first object tracking means 6200 for tracking, in a tracking process of the first object, for each frame of the image to be analyzed, based on matching of the extracted short-time feature of the candidate object with the stored short-time feature template of the first object; a missing object identification part 6300 that identifies the first object as a missing first object if matching based on the short-time feature template of the first object fails in the tracking of the first object; a second object feature extraction part 6400 extracting, for a subsequent newly detected second object, a short-term feature and a long-term feature of the second object using a depth image sequence including the second object; and a missing object recovery part 6500 that determines whether the second object is the missing first object based on a match between the short-time feature and the long-time feature of the second object and the short-time feature template and the long-time feature template of the first object identified as missing.

With regard to the specific functions and operations of the first object template determination means 6100, the first object tracking means 6200, the missing object identification means 6300, the second object feature extraction means 6400, the missing object recovery means 6500 described above, reference may be made to the description related to fig. 1 to 5. The description about repetition is omitted here.

Optionally, the object tracking apparatus according to the embodiment of the present invention may further include a template updating component (not shown) for dynamically updating the short-term feature template and the long-term feature template of the tracked object as the tracking progresses, for subsequent tracking.

Optionally, the object tracking apparatus according to an embodiment of the present invention may further include a feature weight determination part (not shown).

The feature weight determination section may determine the weight of each short-term feature based on a difference between the tracking object and the vicinity of the tracking object at the present time with respect to each short-term feature such that the weight of a certain short-term feature is larger if the difference between the tracking object and the vicinity of the tracking object is larger, whereas the weight of the short-term feature is smaller if the difference is smaller. Therefore, at the next moment, the object tracking device of the embodiment of the invention can match the short-term features of the tracked object with the short-term templates of the related objects based on the weights of the short-term features.

Further, the feature weight determination means may determine a weight of each long-term feature as a statistical distribution of each short-term feature, wherein the determining of the weight of each long-term feature as the statistical distribution of each short-term feature according to a degree of stability of each short-term feature in a case where the specific object is lost a plurality of times at a certain position includes: a store associating the identity of the particular object with a location at which tracking was lost, a number of times that tracking was lost at that location; if the number of times of loss of the specific object at the determined position is greater than a predetermined threshold value, counting the mean and variance of each short-term feature when the object is lost for a plurality of times at the determined position, thereby determining the stability degree of each short-term feature; and giving a high weight to the long-term features that are statistical distributions of the short-term features that are high in degree of stability.

With the object tracking device according to the embodiment of the present invention, for an object found to be missing, particularly long-term missing, it is possible to find the missing object by matching the short-term feature and long-term feature of the newly detected object with the short-term feature template and long-term feature template of the object identified as missing to restore tracking.

5. System hardware configuration

The present invention may also be implemented by an object tracking hardware system. Fig. 7 is an overall hardware block diagram illustrating an object tracking system 1000 according to an embodiment of the present invention. As shown in fig. 7, the object detection system 1000 may include: an input device 1100 for inputting externally relevant images or information, such as a disparity map of an object, a depth map of an object, or images taken by a plurality of cameras, etc., which may include, for example, a keyboard, a mouse, and a communication network and a remote input device connected thereto, etc.; a processing device 1200 for implementing the above-described object tracking method according to an embodiment of the present invention, or implemented as the above-described object tracking means, for example, a central processor or other chip having a processing capability, etc., which may include a computer, may be connected to a network (not shown) such as the internet, remotely transmit an object image as a tracking result according to the need of a processing procedure, etc.; an output device 1300 for outputting the result of the above object tracking process to the outside, which may include, for example, a display, a printer, a communication network and a remote output device connected thereto; and a storage device 1400 for storing images, short-term features, short-term feature templates, long-term features, long-term feature templates, weights of short-term features, weights of long-term features, and the like, which are involved in the above-described object tracking process, in a volatile or non-volatile manner, such as various volatile or non-volatile memories which may include Random Access Memory (RAM), Read Only Memory (ROM), hard disk, or semiconductor memory, and the like.

6. Summary of the invention

According to an embodiment of the present invention, there is provided a method of tracking one or more objects using a depth image or a parallax image, including: for a newly detected first object, extracting short-time features and long-time features of the first object by using a depth image sequence comprising the first object, and storing the short-time features and the long-time features as a short-time feature template and a long-time feature template of the first object, wherein the short-time features of the object represent features of a depicting object with strong time sensitivity, and the long-time features of the object represent statistical distribution of all or part of the short-time features of the object along with time; in the tracking process of the first object, for each frame of image to be analyzed, tracking based on the matching of the short-time features of the extracted candidate object and the stored short-time feature template of the first object; identifying the first object as a missing first object if the matching based on the short-time feature template of the first object fails during the tracking of the first object; for a subsequent newly detected second object, extracting short-term features and long-term features of the second object by using a depth image sequence comprising the second object; and determining whether the second object is the missing first object based on a match between the short-term and long-term features of the second object and the short-term and long-term feature templates of the first object identified as missing.

According to another embodiment of the present invention, there is provided an apparatus for tracking one or more objects using a depth image or a parallax image, including: a first object template determination section that extracts, for a newly detected first object, short-time features and long-time features of the first object using a depth image sequence including the first object, and stores as short-time feature templates and long-time feature templates of the first object, wherein the short-time features of the object represent features of a characterization object that is highly time-sensitive, and the long-time features of the object represent statistical distribution over time of all or part of the short-time features of the object; a first object tracking section that tracks, for each frame of an image to be analyzed, based on matching of the extracted short-time feature of the candidate object with a stored short-time feature template of the first object in a tracking process of the first object; a missing object identification section that identifies the first object as a missing first object if matching based on the short-time feature template of the first object fails in the tracking of the first object; a second object feature extraction section that, for a subsequent newly detected second object, extracts short-term features and long-term features of the second object using a depth image sequence including the second object; and a lost object recovery section that determines whether the second object is the missing first object based on a match between the short-time feature and the long-time feature of the second object and the short-time feature template and the long-time feature template of the first object identified as missing.

According to the method and apparatus of the above-described embodiment of the present invention, for an object whose tracking is lost, especially, which is lost for a long time, it is possible to retrieve the lost object to restore the tracking by matching the short-term feature and the long-term feature of the newly detected object with the short-term feature template and the long-term feature template of the object identified as lost.

The foregoing description is merely illustrative and many modifications and/or substitutions may be made.

The foregoing exemplary description has been directed to tracking objects artificially. However, this is merely an example, and the present invention is not limited to this, and any object may be a tracking target, such as an animal, a car, or the like.

In the foregoing exemplary description, the detection and tracking of the object are implemented by matching between the extracted short-term features of the object and the short-term feature template, but the present invention is not limited to this, and the detection and tracking of the object may also be performed by using, for example, a method of modeling the object or a method of decision tree, etc.

In the foregoing exemplary description, when recovering a long-term missing person, it is determined whether the second object is the missing first object based on a match of the newly detected short-term features and long-term features of the second object with the short-term feature template and long-term feature template of the first object identified as missing. However, this is merely an example, and it may also be determined whether the second object is a missing first object based only on a match of the newly detected long-term features of the second object with the long-term feature template of the first object identified as missing

In addition, in the foregoing exemplary description, the short-term feature used in the tracking of the object and the short-term feature type and the short-term feature matching method used in the recovering of the long-term missing person are the same. However, this is merely an example, and the present invention is not limited to this, and the short-term feature used in the tracking of the object and the short-term feature type and the short-term feature matching method used in the recovery of the lost person may be the same or different.

In addition, in the foregoing exemplary description, a gaussian distribution of short-term characteristics is taken as the long-term characteristics. However, this is merely an example, and other probability distributions suitable for describing statistical characteristics of an object, such as a uniform distribution, an exponential distribution, a gamma distribution, etc., may be selected according to need and practical application.

While the principles of the invention have been described in connection with specific embodiments thereof, it should be noted that it will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which will be within the skill of those in the art after reading the description of the invention and applying their basic programming skills.

Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future.

It is further noted that in the apparatus and method of the present invention, it is apparent that each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of tracking one or more objects using depth images or parallax images, comprising:

for a newly detected first object, extracting short-time features and long-time features of the first object by using a depth image sequence comprising the first object, and storing the short-time features and the long-time features as a short-time feature template and a long-time feature template of the first object, wherein the short-time features of the object represent features of a depicting object with strong time sensitivity, and the long-time features of the object represent statistical distribution of all or part of the short-time features of the object along with time;

in the tracking process of the first object, for each frame of image to be analyzed, tracking based on the matching of the short-time features of the extracted candidate object and the stored short-time feature template of the first object;

identifying the first object as a missing first object if the matching based on the short-time feature template of the first object fails during the tracking of the first object;

for a subsequent newly detected second object, extracting short-term features and long-term features of the second object by using a depth image sequence comprising the second object; and

determining whether the second object is the missing first object based on a match between the short-term and long-term features of the second object and the short-term and long-term feature templates of the first object identified as missing.

2. The method of claim 1, the short-term features of the object comprising one or more of: the position, height and shape characteristics of the object extracted from the single-frame depth image, and the motion direction and motion amplitude of the object calculated through the multi-frame depth image.

3. The method according to claim 1, wherein the statistical distribution over time of all or part of the short-term features of the object is characterized by a gaussian distribution as long-term features of the object.

4. The method of claim 1, wherein said determining whether the second object is a missing first object based on a match between the short-term and long-term features of the second object and the short-term and long-term feature templates of the first object identified as missing comprises:

matching the short-time characteristics of the second object with the short-time characteristic template of the first object;

if the two are matched, determining that the second object is the lost first object;

and if the two are not matched, matching the long-term characteristics of the second object with the long-term characteristic template of the first object to determine whether the second object is the lost first object.

5. The method of claim 4, matching the short-term features of the second object with the short-term feature template of the first object comprises:

determining the confidence degree of matching the short-term features of the second object with the short-term feature template of the first object;

if the confidence is higher than a predetermined threshold, determining that the two match;

if the confidence is lower than a preset threshold value, judging whether the low confidence is caused by occlusion or not based on a preset rule, if so, determining that the two are matched, otherwise, determining that the two are not matched.

6. The method of claim 4, wherein matching the long-term features of the second object with the long-term feature template of the first object to determine whether the second object is a missing first object comprises:

calculating the statistical similarity of the Gaussian distribution of the long-term feature as the second object and the Gaussian distribution of the long-term feature template as the first object through confidence interval estimation, and determining the second object as the lost first object if the statistical similarity of the two Gaussian distributions is larger than a predetermined threshold, otherwise determining the second object not as the lost first object.

7. The method of claim 1, further comprising: and dynamically updating the short-term characteristic template and the long-term characteristic template of the tracked object for subsequent tracking as the tracking is carried out.

8. The method of claim 1, further comprising:

determining the weight of each short-term feature based on the difference of each short-term feature between the tracking object and the adjacent area of the tracking object at the current moment, so that if the difference of a certain short-term feature between the tracking object and the adjacent area of the tracking object is larger, the weight of the short-term feature is larger, and if the difference is smaller, the weight of the short-term feature is smaller; and

at the next moment, matching between the short-term features of the tracked object and the short-term template of the related object is performed based on the weight of the short-term features.

9. The method of claim 1, further comprising:

determining a weight of each long-term feature as a statistical distribution of each short-term feature, wherein the determining the weight of each long-term feature as the statistical distribution of each short-term feature is based on a degree of stability of each short-term feature when the specific object is lost a plurality of times at a certain position, comprises:

a store associating the identity of the particular object with a location at which tracking was lost, a number of times that tracking was lost at that location;

if the number of times of loss of the specific object at the determined position is greater than a predetermined threshold value, counting the mean and variance of each short-term feature when the object is lost for a plurality of times at the determined position, thereby determining the stability degree of each short-term feature; and

10. An apparatus for tracking one or more objects using a depth image or a parallax image, comprising:

a first object template determination section that extracts, for a newly detected first object, short-time features and long-time features of the first object using a depth image sequence including the first object, and stores as short-time feature templates and long-time feature templates of the first object, wherein the short-time features of the object represent features of a characterization object that is highly time-sensitive, and the long-time features of the object represent statistical distribution over time of all or part of the short-time features of the object;

a first object tracking section that tracks, for each frame of an image to be analyzed, based on matching of the extracted short-time feature of the candidate object with a stored short-time feature template of the first object in a tracking process of the first object;

a missing object identification section that identifies the first object as a missing first object if matching based on the short-time feature template of the first object fails in the tracking of the first object;

a second object feature extraction section that, for a subsequent newly detected second object, extracts short-term features and long-term features of the second object using a depth image sequence including the second object; and

and a lost object recovery section that determines whether or not the second object is the missing first object based on a match between the short-term feature and the long-term feature of the second object and the short-term feature template and the long-term feature template of the first object identified as the missing.