WO2020215552A1

WO2020215552A1 - Multi-target tracking method, apparatus, computer device, and storage medium

Info

Publication number: WO2020215552A1
Application number: PCT/CN2019/102318
Authority: WO
Inventors: 杨国青
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-04-26
Filing date: 2019-08-23
Publication date: 2020-10-29
Also published as: CN110210302A; CN110210302B

Abstract

A multi-target tracking method, a multi-target tracking apparatus, a computer device, and a storage medium, comprising: acquiring an image to be examined containing a plurality of targets (S11); using a preset first detection model to detect head regions in the image to be examined (S12); using a preset second detection model to detect figure regions in the image to be examined (S13); on the basis of the head regions and the figure regions, calculating regional ratios (S14); determining whether a regional ratio smaller than a first threshold is present in the regional ratios, the preset threshold being less than 1 (S15); when a regional ratio less than the first preset threshold is preset, determining that an obstructed pedestrian is present in the image to be examined (S16); on the basis of the head region and the figure region, separating out the obstructed pedestrian (S17); and using a preset tracking algorithm to track the separated obstructed pedestrian and an unobstructed pedestrian (S18). In the present method, tracking targets are determined by means of combining both a head region and a figure region, and said method therefore features an excellent tracking effect when multiple obstructed targets are present.

Description

Multi-target tracking method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on April 26, 2019. The application number is 201910345956.8. The invention title is "Multi-target tracking method, device, computer equipment and storage medium". The entire content is incorporated by reference. In this application.

Technical field

This application relates to the technical field of target tracking, in particular to a multi-target tracking method, device, computer equipment and storage medium.

Background technique

With the continuous progress of society and the rapid development of economic construction, video surveillance is increasingly used in various industries and aspects. The intelligent video analysis monitoring system can automatically identify different objects, find abnormal situations in the monitoring screen, and can issue alarms and provide useful information in the fastest and best way, so as to more effectively assist security personnel in dealing with crises.

Target detection is a basic function of video analysis technology, which is of great significance to the realization of follow-up target tracking, target recognition and behavior analysis applications, especially in the field of real-time target event monitoring, its importance is self-evident.

As a non-rigid body, the human body has various morphological changes and is prone to occlusion, as well as complex and diverse video scene changes, which makes it very difficult to effectively detect and track video pedestrians. In practical application scenarios, there are problems such as different poses of pedestrians, occlusion of the human body, sudden light changes, and disturbance of the background environment, so how to quickly and accurately in a video with a complex background, especially when there are occlusions between multiple targets Ground tracking of targets is still an important and difficult point in the field of video image processing technology.

Summary of the invention

In view of the above, it is necessary to propose a multi-target tracking method, device, computer equipment and storage medium, which aims to solve the tracking problem of multiple targets that are occluded. By combining the human head area and the body area to determine the target tracking object, the target can be improved. The effect of tracking.

The first aspect of the present application provides a multi-target tracking method, the method includes:

Obtain an image to be detected including multiple targets;

Calling a preset first detection model to detect the head region in the image to be detected;

Calling a preset second detection model to detect the shape area in the image to be detected;

Calculating an area ratio based on the head area and the body area;

Judging whether there is an area ratio smaller than a preset first threshold in the area ratio, where the preset first threshold is less than 1;

When there is an area ratio in the area ratio that is less than the preset first threshold, it is determined that there is a pedestrian in the image to be detected that is blocked;

Segmenting the shaded pedestrians according to the head area and the shape area;

Call the preset tracking algorithm to track the segmented pedestrians that are occluded and unoccluded.

Preferably, when the area ratio is greater than or equal to the preset first threshold, the method further includes:

Determine whether the area ratio is 1;

When the area ratio is 1, the pedestrian corresponding to the physical area is determined as the target tracking object;

When the area ratio is not 1, the pedestrian corresponding to the head area is determined as the target tracking object;

Invoking the preset tracking algorithm to track the target tracking object.

Preferably, said segmenting the occluded pedestrians according to the head area and the body area includes:

Determining whether the area ratio is greater than a preset second threshold, where the preset second threshold is less than the preset first threshold;

When the area ratio is greater than the preset second threshold, expand the shape area according to a preset ratio coefficient;

The occluded pedestrians are segmented according to the enlarged body area.

Preferably, when the area ratio is less than or equal to the preset second threshold, the method further includes:

The central axis of the two head regions is used as the dividing line, and the key points of the shoulders are used as the boundary to divide the blocked pedestrians.

Preferably, the method of parallel processing is used to simultaneously call the preset first detection model to detect the head region in the image to be detected and call the preset second detection model to detect the shape in the image to be detected area.

Preferably, the calling the preset first detection model to detect the head region in the image to be detected includes:

Calling the preset first detection model to detect multiple human body nodes of each human body in the image to be detected;

The head region corresponding to each human body in the image to be detected is determined according to the multiple human body nodes of each human body.

Preferably, the calculating the area ratio based on the head area and the body area includes:

Establishing a position coordinate system according to the image to be detected;

Acquiring the first area of the head region in the position coordinate system;

Acquiring the second area of the intersection area of the head area and the shape area in the position coordinate system;

The area ratio is calculated based on the first area and the second area.

A second aspect of the present application provides a multi-target tracking device, the device including:

The acquisition module is used to acquire an image to be detected including multiple targets;

The detection module is configured to call a preset first detection model to detect the head region in the image to be detected;

The detection module is further configured to call a preset second detection model to detect the shape area in the image to be detected;

A calculation module, configured to calculate an area ratio based on the head area and the body area;

A judging module for judging whether there is an area ratio smaller than a preset first threshold in the area ratio, wherein the preset first threshold is less than 1;

The segmentation module is used to determine that there is a pedestrian in the image to be detected that is blocked when the area ratio is less than the preset first threshold, and segment according to the head area and the shape area Pedestrians out of shade;

The tracking module is used to call a preset tracking algorithm to track the segmented pedestrians that are blocked and those that are not blocked.

A third aspect of the present application provides a computer device that includes a processor, and the processor is configured to implement the multi-target tracking method when executing computer-readable instructions stored in a memory.

A fourth aspect of the present application provides a non-volatile readable storage medium having computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the multi-target tracking method is implemented.

In summary, the multi-target tracking method, device, computer equipment, and storage medium described in this application first acquire the image to be detected including the multi-target with occlusion, and call the preset first detection model and the preset first detection model respectively. The second detection model detects the head area and the shape area in the image to be detected, and calculates an area ratio between the head area and the shape area. When the area ratio is less than the preset first threshold When the area ratio of, it is determined that there are pedestrians in the image to be detected that are occluded, and then the occluded pedestrians are segmented according to the head region and the shape region, and finally the preset tracking algorithm is called to segment the occluded pedestrians Follow unobstructed pedestrians. This application uses the area ratio to measure the occlusion of pedestrians, so that the occluded pedestrians can be detected; in addition, the target tracking object is determined by combining the head area and the body area to reduce the missed or false detection caused by the pedestrian body being blocked , Improve the effect of target tracking. Therefore, it can be used in scenes with complex backgrounds, especially when there are occlusions between multiple targets, it can track the targets quickly and accurately, which has high practical value.

Description of the drawings

FIG. 1 is a flowchart of a multi-target tracking method provided in Embodiment 1 of the present application.

FIG. 2 is a structural diagram of a multi-target tracking device provided in Embodiment 2 of the present application.

FIG. 3 is a schematic diagram of the structure of a computer device provided in Embodiment 3 of the present application.

The following specific embodiments will further illustrate this application in conjunction with the above-mentioned drawings.

Detailed ways

In order to be able to understand the above objectives, features and advantages of the application more clearly, the application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the application and the features in the embodiments can be combined with each other if there is no conflict.

Example one

In this embodiment, the multi-target tracking method can be applied to a computer device. For computer devices that need to perform multi-target tracking, the multi-target tracking function provided by the method of this application can be directly integrated on the computer device, or It runs in computer equipment in the form of Software Development Kit (SKD).

As shown in Figure 1, the multi-target tracking method specifically includes the following steps. According to different requirements, the order of the steps in the flowchart can be changed, and some of the steps can be omitted.

S11: Obtain an image to be detected including multiple targets.

In this embodiment, the image to be detected may be any suitable image that requires target tracking, for example, an image collected for a monitored area. The image to be detected may be a static image collected by an image collection device such as a camera, or any video frame in a video collected by an image collection device such as a camera.

The image to be detected may be an original image or an image obtained after preprocessing the original image.

In this embodiment, the image to be detected contains multiple pedestrians, and the body parts of multiple pedestrians may overlap significantly. That is, the target tracking object is determined when the body parts of multiple pedestrians have a large overlap, so as to prevent a pedestrian from being blocked by other pedestrians from causing misdetection or missed detection.

S12, calling a preset first detection model to detect the head region in the image to be detected.

In this embodiment, the first detection model can be trained in advance, and by directly calling the pre-trained first detection model, multiple human body nodes of each human body in the image to be detected can be detected directly and quickly. The preset first detection model may be various detection models based on deep learning, for example, a detection model based on a neural network, or a detection model based on a residual network.

Preferably, before the acquiring the image to be detected including multiple targets, the method further includes:

The first detection model is trained in advance, wherein the training process of the first detection model includes:

1) Obtain multiple human pictures, and manually label the head area in each human picture as a sample picture set;

2) Extracting a first preset ratio of human body pictures from the sample picture set as a sample picture set to be trained, and extracting a second preset ratio of human body pictures from the sample picture set as a sample picture set to be verified;

3) Use the sample picture set to be trained to train a preset neural network to obtain a first detection model, and use the sample picture set to be verified to verify the first detection model obtained through training;

4) If the verification pass rate is greater than or equal to the preset threshold, the training of the first detection model is completed; otherwise, the number of human images in the sample image set to be trained is increased to retrain and verify the first detection model.

Exemplarily, assuming that 100,000 human body pictures are obtained, tools such as OpenPose or PoseMachine are used to label multiple human body nodes in the head region in the human body pictures, for example, to label left eye nodes, right eye nodes, left ear nodes, Right ear node. Extract a first preset ratio of human body pictures as the sample picture set to be trained (referred to as the training set), and extract the second preset ratio of human body pictures as the sample picture set to be verified (referred to as the verification set for short). The number of human body pictures is much larger than the number of human body pictures in the verification set. For example, 80% of the human body pictures in the human body pictures are used as the training set, and the remaining 20% of the human body pictures are used as the verification set.

When the neural network is trained for the first time to obtain the first detection model, the parameters of the neural network adopt the default parameters. After that, the parameters were adjusted continuously during the training process. After training to generate the first detection model, the generated first detection model is verified using the human body pictures in the verification set. If the verification pass rate is greater than or equal to a preset threshold, for example, the pass rate is greater than or equal to 98%, the training ends. The first detection model obtained by this training is used to identify the human body node. If the verification pass rate is less than the preset threshold, for example, less than 98%, the number of human body pictures participating in the training is increased, and the above steps are performed again until the verification pass rate is greater than or equal to the preset threshold.

During the test, the first detection model obtained by training is used to identify the human body node in the human body picture in the verification set, and the recognition result is compared with the human body node of the human body picture in the verification set to evaluate the trained first detection model The recognition effect.

1) Call the preset first detection model to detect multiple human body nodes of each human body in the image to be detected;

In this embodiment, multiple human body nodes of each human body in the image to be detected are detected through the preset first detection model, for example, a neural network model.

Among them, the human body node may be an important position of the human body such as the joint points of the human body and the facial features. The multiple human body nodes include at least multiple nodes of the head and the neck. Exemplarily, the multiple human body nodes include: one or more of a neck node, a nose tip node, a left eye node, a right eye node, a left ear node, and a right ear node. In other embodiments, the multiple human body nodes determined by the preset first detection model further include at least a wrist node, an elbow node, and a shoulder node.

Each human body node represents the human body area including the node, for example, the left eye node represents the entire left eye area of the human body, rather than just a specific pixel.

2) Determine the head area corresponding to each human body in the image to be detected according to the multiple human body nodes of each human body.

In this embodiment, the head area is an area determined according to multiple nodes of the head and the neck to characterize the human head. For example, the head area of the human body is determined according to the neck node, nose tip node, left eye node, right eye node, left ear node, and right ear node. The determined shape of the head region can be rectangular, circular, oval, or any other regular or irregular shapes. This application does not specifically limit the shape of the determined head region.

In this embodiment, the process of pre-training the first detection model may be an offline training process. The process of calling the first detection model to detect the head region in the image to be detected may be an online detection process. That is, the image to be detected is used as the input of the first detection model, and the output is the human node information in the image to be detected, for example, the top of the head, eyes, mouth, chin, ears, neck, etc. A human body node appears. According to the multiple human body nodes, the human head is framed with a geometric figure, such as a rectangular frame, and the rectangular frame at this time is called a head frame.

S13, calling a preset second detection model to detect the shape area in the image to be detected.

In this embodiment, after the image to be detected is acquired, the preset second detection model is called to detect the shape area in the image to be detected. The preset second detection model can be implemented using an accelerated version of the region-based convolutional neural network (Faster-RCNN).

The preset second detection model is pre-trained using a large number of human images. The preset second detection model may be trained before acquiring the image to be detected including multiple targets. The process of training the second detection model in advance is similar to the process of training the first detection model in advance, and will not be repeated here.

The shape area in the image to be detected is recognized by inputting the image to be detected into the second detection model.

In this embodiment, the process of pre-training the second detection model may be an offline training process. The process of calling the preset second detection model to detect the shape area in the image to be detected may be an online detection process. That is, the image to be detected is used as the input of the second detection model, and the output is the human body information in the image to be detected. According to the human body information, the human body shape area is framed by a rectangular frame. Call it a pedestrian frame.

Preferably, the method of parallel processing is used to simultaneously call the preset first detection model to detect the head region in the image to be detected and call the preset second detection model to detect the shape in the image to be detected area. In this embodiment, the parallel processing method is used to simultaneously input the image to be detected into the predetermined head area in the preset first detection model and input the predetermined shape area in the preset second detection model, which can save processing time and improve Processing efficiency.

S14: Calculate an area ratio based on the head area and the body area.

In this embodiment, after determining multiple head regions and multiple body regions in the image to be detected, an area ratio can be calculated based on the head region and the body region.

The area ratio refers to the ratio between the intersection area of the head area and the body area and the head area.

Preferably, the determining the area ratio based on the head area and the body area includes:

Acquiring the first area of the head region in the position coordinate system;

The area ratio is calculated based on the first area and the second area.

In this embodiment, the position coordinate system is established with the upper left corner of the image to be detected as the origin, the upper edge of the image as the X axis, and the left side of the image as the Y axis.

After the position coordinate system is established, the first position coordinates of each vertex of the head frame (take a rectangular frame as an example) corresponding to the head area are obtained, and the shape frame corresponding to the shape area (take the rectangular frame as an example) is obtained. For example) the second position coordinates of each vertex. Determine the first area of the head area according to the first position coordinates, determine the intersection area of the head area and the body area according to the first position coordinates and the second position coordinates, and then obtain all According to the third position coordinates of each vertex of the intersection area, the second area of the intersection area is determined according to the third position coordinates. Finally, an area ratio (Intersection over Union, IOU) is calculated according to the first area and the second area.

S15: Determine whether there is an area ratio that is smaller than a preset first threshold in the area ratios, where the preset first threshold is less than 1.

In general, for the same pedestrian, the head area is contained in the body area, that is, the head frame is contained in the pedestrian frame. When the pedestrian is not blocked, the pedestrian's head area is completely contained in the body area, and the calculated area ratio should be 1. When the pedestrian is partially blocked, the pedestrian's head area is partially contained in the body area. In the area, the calculated area ratio at this time is less than 1. When the pedestrian's physical area is completely occluded, the pedestrian's head area is not included in the physical area at all, and the calculated area ratio is 0 at this time.

In this embodiment, a first threshold may be preset, and the preset first threshold is less than 1, which may be, for example, 0.7.

By comparing the calculated area ratio with the preset threshold value, it is determined whether there is a pedestrian in the image to be detected that is blocked. That is, the ratio of the intersection of the head frame and the pedestrian frame to the head frame is used to measure the overlap of the head frame and the pedestrian frame or to determine whether the head frame matches the pedestrian frame. The larger the area ratio, the larger the overlap ratio of the head frame and the pedestrian frame, and the more matching the head frame and the pedestrian frame.

S16: When there is an area ratio that is less than the preset first threshold in the area ratio, it is determined that there is a pedestrian in the image to be detected and is blocked.

In this embodiment, if multiple area ratios are calculated, the magnitude relationship between each area ratio and the preset first threshold can be determined. If there is a target area ratio that is less than the preset first threshold in the multiple area ratios, it indicates that the pedestrian corresponding to the target area ratio in the image to be tested is severely blocked. If each of the multiple area ratios is greater than or equal to the preset first threshold, it indicates that multiple pedestrians in the image to be tested are not blocked or are not seriously blocked.

S17, segmenting the blocked pedestrians according to the head area and the shape area.

In this embodiment, when it is determined that there is a pedestrian in the image to be detected that is severely occluded, the severely occluded pedestrian in the image to be detected may be first segmented according to the head region and the shape region.

Specifically, the segmenting the occluded pedestrian according to the head area and the shape area includes:

When the area ratio is greater than the preset second threshold, expand the shape area according to a preset scale factor, and segment the shaded pedestrian according to the expanded shape area;

When the area ratio is less than or equal to the preset second threshold, the central axis of the two head areas is used as the dividing line, and the key points of the shoulders are used as the boundary to divide the shaded pedestrian.

When crowds gather, there will be a high probability that Pedestrian A is blocked by Pedestrian B. At this time, there is no problem with the detection of Pedestrian B. However, because the body of Pedestrian A is partially blocked by Pedestrian B, there will be two situations: the first is Pedestrian A is partially unobstructed, and the second is that Pedestrian A is almost completely blocked. A second threshold may be preset, and the preset second threshold is smaller than the preset first threshold, for example, 0.3. By further comparing the magnitude relationship between the area ratio and the preset second threshold, it can be determined whether the pedestrian armor is almost completely blocked.

For the first case above: when the area ratio is greater than the preset second threshold but less than the preset first threshold, although the corresponding pedestrian is severely blocked, it indicates that the pedestrian's head area has been accurately detected. However, the shape of the pedestrian does not match the human head, and the confidence of this lower detection is relatively low, and it is easy to be masked as a false detection during post-processing. The corresponding shape area is expanded according to a preset scale factor (for example, 1.5) and then divided, which improves the detection confidence of the blocked pedestrian, thereby reducing the risk of being blocked in the post-processing detection and screening.

For the second case above: when the area ratio is less than or equal to the preset second threshold, Pedestrian A and Pedestrian B share a human body frame, but correspond to two head frames. At this time, the human body frame can be marked It is a double, and along the central axis of the two head frames, and the key points of the shoulders are used as the left and right boundaries of the human body to separate the pedestrian armor.

S18, calling a preset tracking algorithm to track the segmented pedestrians that are blocked and those that are not blocked.

In this embodiment, the preset tracking algorithm may be a multi-target tracking algorithm. After the pedestrians in the image to be tested are segmented, the segmented pedestrians and the pedestrians that are not blocked can be tracked.

The multi-target tracking algorithm is an existing technology, and this article will not elaborate on it.

Determine whether the area ratio is 1;

Invoking the preset tracking algorithm to track the target tracking object.

In this embodiment, when the area ratio is greater than or equal to the preset first threshold, that is, when multiple pedestrians in the image to be tested are not blocked or are not severely blocked, it is necessary to further determine whether the area ratio is 1 to determine Pedestrians in the image to be tested are not blocked or slightly blocked.

If the ratio of each area is 1, it means that the head area of the pedestrian in the image to be tested is completely contained in the physical area, that is, the pedestrian is not blocked, because the determined physical area is the entire area of the pedestrian, so Pedestrians corresponding to the physical area are tracked for the target tracking object, and the tracking effect is better. If the area ratio is not 1, it indicates that there is a phenomenon that the pedestrian is slightly occluded in the image to be tested. Because the head area is more distinct, the pedestrian corresponding to the head area is used as the target tracking object for tracking, and the tracking effect Better.

To sum up, the multi-target tracking method described in this application first acquires the image to be detected including the multi-target that is occluded, and calls the preset first detection model and the preset second detection model to detect the Detect the head area and the body area in the image, calculate an area ratio of the head area and the body area, and when the area ratio has an area ratio smaller than the preset first threshold, determine the Pedestrians are occluded in the image to be detected, and then the occluded pedestrians are segmented according to the head region and the body region, and finally the preset tracking algorithm is called to track the segmented occluded pedestrians and unoccluded pedestrians . This application uses the area ratio to measure the occlusion of pedestrians, so that the occluded pedestrians can be detected; in addition, the target tracking object is determined by combining the head area and the body area to reduce the missed or false detection caused by the pedestrian body being blocked , Improve the effect of target tracking. Therefore, it can be used in scenes with complex backgrounds, especially when there are occlusions between multiple targets, it can track the targets quickly and accurately, which has high practical value.

Example two

In some embodiments, the multi-target tracking device 20 may include multiple functional modules composed of computer-readable instruction code segments. The code of each computer-readable instruction code segment in the multi-target tracking device 20 may be stored in the memory of the computer device and executed by the at least one processor to execute (see FIG. 1 for details) to prevent the existence of Multi-target detection.

In this embodiment, the multi-target tracking device 20 can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an acquisition module 201, a detection module 202, a training module 203, a calculation module 204, a judgment module 205, a determination module 206, a segmentation module 207, and a tracking module 208. The module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In this embodiment, the function of each module will be described in detail in subsequent embodiments.

The acquisition module 201 is used to acquire a to-be-detected image including multiple targets.

The detection module 202 is configured to call a preset first detection model to detect the head region in the image to be detected.

The training module 203 is configured to pre-train the first detection model, where the training process of the first detection model includes:

Preferably, the detection module 202 calling a preset first detection model to detect the head region in the image to be detected includes:

The detection module 202 is further configured to call a preset second detection model to detect the shape area in the image to be detected.

The preset second detection model is pre-trained using a large number of human images. The preset second detection model may be trained before acquiring the image to be detected including multiple targets. The process of pre-training the second detection model is similar to the foregoing process of pre-training the first detection model, and will not be repeated here.

The calculation module 204 is configured to calculate an area ratio based on the head area and the body area.

Preferably, the calculation module 204 determining the area ratio according to the head area and the body area includes:

Acquiring the first area of the head region in the position coordinate system;

The area ratio is calculated based on the first area and the second area.

The judging module 205 is used to judge whether there is an area ratio smaller than a preset first threshold in the area ratios, wherein the preset first threshold is less than one.

The determining module 206 is configured to determine that a pedestrian is blocked in the image to be detected when there is an area ratio in the area ratio that is less than the preset first threshold.

The segmentation module 207 is used to segment the occluded pedestrians according to the head area and the body area.

Specifically, the segmentation module 207 segmenting the occluded pedestrian according to the head area and the body area includes:

For the second case above: when the area ratio is less than or equal to the preset second threshold, Pedestrian A and Pedestrian B share a human body frame, but correspond to two head frames. At this time, the human body frame can be marked It is a double and separates the pedestrian armor along the central axis of the two head frames and the key points of the shoulders as the left and right boundaries of the human body.

The tracking module 208 is configured to call a preset tracking algorithm to track the segmented pedestrians that are blocked and those that are not blocked.

Preferably, the determining module 205 is further configured to determine whether the area ratio is 1 when the area ratio is greater than or equal to the preset first threshold.

Preferably, the determining module 206 is further configured to determine the pedestrian corresponding to the physical area as the target tracking object when the area ratio is 1.

Preferably, the determining module 206 is further configured to determine the pedestrian corresponding to the head area as the target tracking object when the area ratio is not 1.

Preferably, the tracking module 208 is further configured to call the preset tracking algorithm to track the target tracking object.

To sum up, the multi-target tracking device described in this application first acquires images to be detected including multiple targets that are occluded, and respectively calls a preset first detection model and a preset second detection model to detect the Detect the head area and the body area in the image, calculate an area ratio of the head area and the body area, and when the area ratio has an area ratio smaller than the preset first threshold, determine the Pedestrians are occluded in the image to be detected, and then the occluded pedestrians are segmented according to the head region and the body region, and finally the preset tracking algorithm is called to track the segmented occluded pedestrians and unoccluded pedestrians . This application uses the area ratio to measure the occlusion of pedestrians, so that the occluded pedestrians can be detected; in addition, the target tracking object is determined by combining the head area and the body area to reduce the missed or false detection caused by the pedestrian body being blocked , Improve the effect of target tracking. Therefore, it can be used in scenes with complex backgrounds, especially when there are occlusions between multiple targets, it can track the targets quickly and accurately, which has high practical value.

Example three

Refer to FIG. 3, which is a schematic structural diagram of a computer device provided in Embodiment 3 of this application. In a preferred embodiment of the present application, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.

Those skilled in the art should understand that the structure of the computer device shown in FIG. 3 does not constitute a limitation of the embodiment of the present application. It may be a bus-type structure or a star structure. The computer device 3 may also include a graph Show more or less other hardware or software, or different component arrangements.

In some embodiments, the computer device 3 includes a computer device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit. , Programmable gate arrays, digital processors and embedded devices, etc. The computer device 3 may also include a client device, and the client device includes but is not limited to any electronic product that can interact with a client through a keyboard, a mouse, a remote control, a touch panel, or a voice control device, for example, Personal computers, tablet computers, smart phones, digital cameras, etc.

It should be noted that the computer device 3 is only an example, and other existing or future electronic products that can be adapted to this application should also be included in the scope of protection of this application and included here by reference .

In some embodiments, the memory 31 is used to store computer-readable instruction codes and various data, such as the multi-target tracking device 20 installed in the computer equipment 3, and achieve high speed during the operation of the computer equipment 3 , Automatically complete the access of computer readable instructions or data. The memory 31 includes Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), and Erasable Programmable Read-Only Memory (EPROM) , One-time Programmable Read-Only Memory (OTPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM), CD-ROM (Compact Disc Read- Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, tape storage, or any other computer-readable medium that can be used to carry or store data.

In some embodiments, the at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one Or a combination of multiple central processing units (CPU), microprocessors, digital processing chips, graphics processors, and various control chips. The at least one processor 32 is the control core (Control Unit) of the computer device 3, which uses various interfaces and lines to connect the various components of the entire computer device 3, and can run or execute the computer stored in the memory 31. Read instructions or modules, and call data stored in the memory 31 to perform various functions and process data of the computer device 3, for example, perform multi-target tracking.

In some embodiments, the at least one communication bus 33 is configured to implement connection and communication between the memory 31 and the at least one processor 32 and the like.

Although not shown, the computer device 3 may also include a power supply for supplying power to various components. Preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so that the management of charging and discharging is realized through the power management device. , And power management and other functions. The computer device 3 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

The above-mentioned integrated unit implemented in the form of a software function module can be stored in a non-volatile readable storage medium and includes several instructions to enable a computer device (which can be a personal computer, a computer device, or a network device, etc.) ) Or a processor (processor) executes part of the method described in each embodiment of the present application.

In a further embodiment, with reference to FIG. 2, the at least one processor 32 can execute the operating device of the computer device 3 and various installed applications (such as the multi-target tracking device 20), etc., for example, The various modules mentioned above.

The memory 31 stores computer readable instruction codes, and the at least one processor 32 can call the computer readable instruction codes stored in the memory 31 to perform related functions. For example, the various modules described in FIG. 2 are computer-readable instruction codes stored in the memory 31 and executed by the at least one processor 32, so as to realize the functions of the various modules to achieve multi-target tracking. the goal of.

In an embodiment of the present application, the memory 31 stores multiple instructions, and the multiple instructions are executed by the at least one processor 32 to achieve multi-target tracking. Specifically, for the specific implementation method of the at least one processor 32 on the foregoing instructions, reference may be made to the description of the relevant steps in the embodiment corresponding to FIG.

In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional modules.

For those skilled in the art, no matter from which point of view, the embodiments should be regarded as exemplary and non-restrictive. The scope of the present application is defined by the appended claims rather than the above description, so it is intended that All changes falling within the meaning and scope of equivalent elements of the claims are included in this application. In addition, it is obvious that the word "including" does not exclude other elements or, and the singular does not exclude the plural. Multiple units or devices stated in the device claims can also be implemented by one unit or device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Modifications or equivalent replacements are made without departing from the spirit and scope of the technical solution of this application.

Claims

A multi-target tracking method, characterized in that the method includes:

Obtain an image to be detected including multiple targets;

Calling a preset first detection model to detect the head region in the image to be detected;

Calling a preset second detection model to detect the shape area in the image to be detected;

Calculating an area ratio based on the head area and the body area;

Judging whether there is an area ratio smaller than a preset first threshold in the area ratio, where the preset first threshold is less than 1;

When there is an area ratio in the area ratio that is less than the preset first threshold, it is determined that there is a pedestrian in the image to be detected that is blocked;

Segmenting the shaded pedestrians according to the head area and the shape area;

Call the preset tracking algorithm to track the segmented pedestrians that are occluded and unoccluded.
The method according to claim 1, wherein when the area ratio is greater than or equal to the preset first threshold, the method further comprises:

Determine whether the area ratio is 1;

When the area ratio is 1, the pedestrian corresponding to the physical area is determined as the target tracking object;

When the area ratio is not 1, the pedestrian corresponding to the head area is determined as the target tracking object;

Invoking the preset tracking algorithm to track the target tracking object.
The method according to claim 1, wherein the segmenting the occluded pedestrian according to the head area and the body area comprises:

Determining whether the area ratio is greater than a preset second threshold, where the preset second threshold is less than the preset first threshold;

When the area ratio is greater than the preset second threshold, expand the shape area according to a preset ratio coefficient;

The occluded pedestrians are segmented according to the enlarged body area.
8. The method of claim 3, wherein when the area ratio is less than or equal to the preset second threshold, the method further comprises:

The central axis of the two head regions is used as the dividing line, and the key points of the shoulders are used as the boundary to divide the blocked pedestrians.
The method according to claim 1, wherein the method of parallel processing is used to simultaneously call the preset first detection model to detect the head region in the image to be detected and call the preset second detection model The shape area in the image to be detected is detected.
The method according to claim 1, wherein said calling a preset first detection model to detect the head region in the image to be detected comprises:

Calling the preset first detection model to detect multiple human body nodes of each human body in the image to be detected;

The head region corresponding to each human body in the image to be detected is determined according to the multiple human body nodes of each human body.
The method according to any one of claims 1 to 6, wherein said calculating an area ratio based on said head area and said body area comprises:

Establishing a position coordinate system according to the image to be detected;

Acquiring the first area of the head region in the position coordinate system;

Acquiring the second area of the intersection area of the head area and the shape area in the position coordinate system;

The area ratio is calculated based on the first area and the second area.
A multi-target tracking device, characterized in that the device includes:

The acquisition module is used to acquire an image to be detected including multiple targets;

The detection module is configured to call a preset first detection model to detect the head region in the image to be detected;

The detection module is further configured to call a preset second detection model to detect the shape area in the image to be detected;

A calculation module, configured to calculate an area ratio based on the head area and the body area;

A judging module for judging whether there is an area ratio smaller than a preset first threshold in the area ratio, wherein the preset first threshold is less than 1;

The segmentation module is used to determine that there is a pedestrian in the image to be detected that is blocked when the area ratio is less than the preset first threshold, and segment according to the head area and the shape area Pedestrians out of shade;

The tracking module is used to call a preset tracking algorithm to track the segmented pedestrians that are blocked and those that are not blocked.
A computer device, characterized in that the computer device includes a processor and a memory, the memory is used to store computer-readable instructions, and the processor executes the computer-readable instructions to implement the following steps:

Obtain an image to be detected including multiple targets;

Calling a preset first detection model to detect the head region in the image to be detected;

Calling a preset second detection model to detect the shape area in the image to be detected;

Calculating an area ratio based on the head area and the body area;

Judging whether there is an area ratio smaller than a preset first threshold in the area ratio, where the preset first threshold is less than 1;

When there is an area ratio in the area ratio that is less than the preset first threshold, it is determined that there is a pedestrian in the image to be detected that is blocked;

Segmenting the shaded pedestrians according to the head area and the shape area;

Call the preset tracking algorithm to track the segmented pedestrians that are occluded and unoccluded.
9. The computer device of claim 9, wherein when the area ratio is greater than or equal to the preset first threshold, the processor is further configured to execute computer readable instructions to implement the following steps:

Determine whether the area ratio is 1;

When the area ratio is 1, the pedestrian corresponding to the physical area is determined as the target tracking object;

When the area ratio is not 1, the pedestrian corresponding to the head area is determined as the target tracking object;

Invoking the preset tracking algorithm to track the target tracking object.
The computer device according to claim 9, wherein when the processor executes the computer-readable instruction to realize the segmentation of the occluded pedestrian according to the head area and the body area, it comprises The following steps:

Determining whether the area ratio is greater than a preset second threshold, where the preset second threshold is less than the preset first threshold;

When the area ratio is greater than the preset second threshold, expand the shape area according to a preset ratio coefficient;

The occluded pedestrians are segmented according to the enlarged body area.
11. The computer device of claim 11, wherein when the area ratio is less than or equal to the preset second threshold, the processor is further configured to execute computer readable instructions to implement the following steps:

The central axis of the two head regions is used as the dividing line, and the key points of the shoulders are used as the boundary to divide the blocked pedestrians.
The computer device according to claim 9, wherein the processor executes the computer-readable instruction to implement the calling preset first detection model to detect the head region in the image to be detected , Including the following steps:

Calling the preset first detection model to detect multiple human body nodes of each human body in the image to be detected;

The head region corresponding to each human body in the image to be detected is determined according to the multiple human body nodes of each human body.
The computer device according to any one of claims 9 to 13, wherein the processor is executing the computer-readable instructions to implement the calculation of the area ratio based on the head area and the body area , Including the following steps:

Establishing a position coordinate system according to the image to be detected;

Acquiring the first area of the head region in the position coordinate system;

Acquiring the second area of the intersection area of the head area and the shape area in the position coordinate system;

The area ratio is calculated based on the first area and the second area.
A non-volatile readable storage medium having computer readable instructions stored thereon, characterized in that, when the computer readable instructions are executed by a processor, the following steps are implemented:

Obtain an image to be detected including multiple targets;

Calling a preset first detection model to detect the head region in the image to be detected;

Calling a preset second detection model to detect the shape area in the image to be detected;

Calculating an area ratio based on the head area and the body area;

Judging whether there is an area ratio smaller than a preset first threshold in the area ratio, where the preset first threshold is less than 1;

When there is an area ratio in the area ratio that is less than the preset first threshold, it is determined that there is a pedestrian in the image to be detected that is blocked;

Segmenting the shaded pedestrians according to the head area and the shape area;

Call the preset tracking algorithm to track the segmented pedestrians that are occluded and unoccluded.
15. The storage medium of claim 15, wherein when the area ratio is greater than or equal to the preset first threshold, the computer-readable instructions are further used to implement the following steps when executed by the processor:

Determine whether the area ratio is 1;

When the area ratio is 1, the pedestrian corresponding to the physical area is determined as the target tracking object;

When the area ratio is not 1, the pedestrian corresponding to the head area is determined as the target tracking object;

Invoking the preset tracking algorithm to track the target tracking object.
The storage medium of claim 15, wherein the computer-readable instructions are executed by a processor to realize the segmentation of the occluded pedestrian based on the head area and the body area, comprising the following steps :

Determining whether the area ratio is greater than a preset second threshold, where the preset second threshold is less than the preset first threshold;

When the area ratio is greater than the preset second threshold, expand the shape area according to a preset ratio coefficient;

The occluded pedestrians are segmented according to the enlarged body area.
18. The storage medium of claim 17, wherein when the area ratio is less than or equal to the preset second threshold, the computer-readable instructions are executed by the processor to further implement the following steps:

The central axis of the two head regions is used as the dividing line, and the key points of the shoulders are used as the boundary to divide the blocked pedestrians.
15. The storage medium according to claim 15, wherein the computer-readable instructions are executed by the processor to realize that when the calling preset first detection model detects the head region in the image to be detected, it includes The following steps:

Calling the preset first detection model to detect multiple human body nodes of each human body in the image to be detected;

The head region corresponding to each human body in the image to be detected is determined according to the multiple human body nodes of each human body.
The storage medium according to any one of claims 15 to 19, wherein the computer-readable instructions are executed by a processor to realize the calculation of the area ratio based on the head area and the body area, It includes the following steps:

Establishing a position coordinate system according to the image to be detected;

Acquiring the first area of the head region in the position coordinate system;

Acquiring the second area of the intersection area of the head area and the shape area in the position coordinate system;

The area ratio is calculated based on the first area and the second area.