WO2021114702A1

WO2021114702A1 - Target tracking method, apparatus and system, and computer-readable storage medium

Info

Publication number: WO2021114702A1
Application number: PCT/CN2020/109081
Authority: WO
Inventors: 任培铭; 刘金杰; 乐振浒; 张翔; 林诰
Original assignee: 中国银联股份有限公司
Priority date: 2019-12-10
Filing date: 2020-08-14
Publication date: 2021-06-17
Also published as: TW202123171A; CN111145213A; TWI795667B

Abstract

A target tracking method, apparatus and system, and a computer-readable storage medium. The method comprises: obtaining current frames to be detected of multiple cameras disposed in a monitoring area (step S101); performing target detection on the current frame to be detected of each of the multiple cameras in sequence to obtain a bounding box set corresponding to each camera (step S102); and performing target tracking according to the bounding box set corresponding to each camera, and determining a global target trajectory according to a tracking result (step S103). Using the method above can reduce computing resources for target tracking based on multiple cameras.

Description

Target tracking method, device, system and computer readable storage medium

Technical field

The invention belongs to the field of image processing, and specifically relates to a target tracking method, device, system and computer readable storage medium.

Background technique

This section is intended to provide background or context for the embodiments of the invention stated in the claims. The description here is not recognized as prior art just because it is included in this section.

At present, with the popularization of video surveillance technology and the ever-increasing security requirements, target tracking applied in the field of video surveillance has gradually become one of the hot spots in the field of computer vision research. Tracking the movement trajectory of a target object usually requires acquiring an image of the surveillance area of the camera, performing target detection on the image to identify the target, and tracking the identified target object to obtain the complete trajectory of the target object. Due to the complexity of the surveillance scene and the limited field of view of a single camera, in order to achieve global surveillance, the cooperation of multiple cameras may be required to cover the surveillance area globally. However, the existing multi-camera-based target tracking methods need to analyze images and achieve target tracking through deep learning methods. With the increase in the number of cameras, the demand for computing resources and communication resources increase significantly at the same time, causing a technical bottleneck for target tracking.

Summary of the invention

In view of the above-mentioned problems in the prior art, a target tracking method, device, and computer-readable storage medium are proposed. By using this method, device and computer-readable storage medium, the above-mentioned problems can be solved.

The present invention provides the following solutions.

In a first aspect, a target tracking method is provided, which includes: acquiring the current frame to be measured of multiple cameras arranged in a monitoring area; performing target detection on the current frame to be measured of each of the multiple cameras in turn, and obtaining each The detection frame set corresponding to the camera; target tracking is performed according to the detection frame set corresponding to each camera, and the global target trajectory is determined according to the tracking result.

In some possible implementation manners, it further includes: determining a plurality of frame numbers to be tested, and iteratively acquiring the current frame to be tested of multiple cameras according to the number of frame numbers to be tested in time series, so as to perform target tracking iteratively; wherein, according to The initial frame number to be tested in the multiple frame numbers to be tested corresponds to the initial global target trajectory; the subsequent frame number to be tested in the frame numbers to be tested corresponds to the iteratively updated global target trajectory.

In some possible implementation manners, performing target detection on the current frame under test of each camera includes: inputting the current frame under test of each camera into a target detection model for target detection; wherein the target detection model is based on neural network training The resulting pedestrian detection model.

In some possible implementation manners, after the detection frame set corresponding to each camera is obtained, the method further includes: determining the center point of each detection frame in the detection frame set corresponding to each camera according to the viewing position of each camera Perform projection transformation to determine the ground coordinates of each detection frame.

In some possible implementation manners, the viewing areas of the multiple cameras overlap at least partially, and the method further includes: dividing the working area of each camera in the ground coordinate system according to the viewing area of each camera; wherein, the work of each camera The areas do not overlap each other. If the ground coordinates of any detection frame corresponding to the first camera of the multiple cameras exceeds the corresponding working area, any detection frame is removed from the detection frame set of the first camera.

In some possible implementation manners, the method further includes: cutting off non-critical areas in the working area of each camera.

In some possible implementation manners, tracking according to the detection frame set corresponding to each camera includes: adopting a multi-target tracking algorithm, and performing multi-target tracking based on the detection frame set corresponding to each camera, and determining the part corresponding to each camera Tracking information: Among them, the parameters used in multi-target tracking are determined based on the historical frame to be measured of each camera.

In some possible implementation manners, the multi-target tracking algorithm is a deepsort algorithm.

In some possible implementation manners, it further includes: adding an identity to each detection frame according to the local tracking information corresponding to each camera; and determining the iteratively updated global target trajectory based on the identity and ground coordinates of each detection frame.

In some possible implementation manners, it further includes: determining the association relationship between the multiple cameras according to the working areas of the multiple cameras; determining the new detection frame and the disappearing detection frame in the corresponding working area according to the local tracking information of each camera ; According to the association relationship between multiple cameras, the newly added detection frames and disappearance detection frames in different working areas are associated to obtain associated information; the iteratively updated global target trajectory is determined according to the associated information.

In a second aspect, a target tracking device is provided, including: an acquisition unit for acquiring the current frame to be measured of multiple cameras arranged in a monitoring area; a detection unit for sequentially checking the current frame of each of the multiple cameras Target detection is performed on the frame to be tested to obtain a detection frame set corresponding to each camera; the tracking unit is used to perform target tracking according to the detection frame set corresponding to each camera, and determine the global target trajectory according to the tracking result.

In some possible implementation manners, it further includes: a frame selection unit for determining multiple frame numbers to be tested, and iteratively acquires the current frames to be tested from multiple cameras in time series according to the multiple frame numbers to be tested, so as to perform iteratively Target tracking; wherein, the initial global target trajectory is obtained according to the initial frame sequence number to be tested among the multiple frame numbers to be tested; the global target trajectory after iterative update is obtained corresponding to the subsequent frame sequence numbers to be tested in the multiple frame sequence numbers to be tested.

In some possible implementation manners, the detection unit is further used to: input the current frame to be measured of each camera into the target detection model for target detection; wherein the target detection model is a pedestrian detection model obtained based on neural network training.

In some possible implementation manners, the detection unit is further configured to: after obtaining the detection frame set corresponding to each camera, determine the value of each detection frame in the detection frame set corresponding to each camera according to the viewing position of each camera. Projection transformation is performed on the center point of the bottom of the frame to determine the ground coordinates of each detection frame.

In some possible implementation manners, the viewing areas of multiple cameras overlap at least partially, and the device is further used to: divide the working area of each camera in the ground coordinate system according to the viewing area of each camera; The working areas do not overlap each other. If the ground coordinates of any one of the detection frames corresponding to the first camera among the multiple cameras exceeds the corresponding working area, then any one of the detection frames is removed from the detection frame set of the first camera.

In some possible implementation manners, the detection unit is also used to: cut off non-critical areas in the working area of each camera.

In some possible implementations, the tracking unit is also used to: adopt a multi-target tracking algorithm, and perform multi-target tracking based on the detection frame set corresponding to each camera, and determine the local tracking information corresponding to each camera; The parameters used in tracking are determined based on the historical frames to be measured for each camera.

In some possible implementations, the tracking unit is also used to: add an identity to each detection frame according to the local tracking information corresponding to each camera; to determine the iteratively updated global based on the identity and ground coordinates of each detection frame Target trajectory.

In some possible implementation manners, the tracking unit is further configured to: determine the association relationship between the multiple cameras according to the work areas of the multiple cameras; determine the new detection frame in the corresponding work area according to the local tracking information of each camera And disappear detection frame; according to the association relationship between multiple cameras, the newly added detection frame and disappearance detection frame in different working areas are associated to obtain the associated information; the global target track after iterative update is determined according to the associated information.

In a third aspect, a target tracking system is provided, including: a plurality of cameras arranged in a monitoring area, and a target tracking device respectively communicatively connected with the plurality of cameras; wherein the target tracking device is configured to perform as in the first aspect Methods.

In a fourth aspect, a target tracking device is provided, including: one or more multi-core processors; a memory for storing one or more programs; when one or more programs are executed by one or more multi-core processors, One or more multi-core processors realize: obtain the current frame to be tested of multiple cameras set in the monitoring area; perform target detection on the current frame to be tested of each of the multiple cameras in turn to obtain the corresponding detection of each camera Frame set; target tracking is performed according to the detection frame set corresponding to each camera, and the global target trajectory is determined according to the tracking result.

In a fifth aspect, a computer-readable storage medium is provided, and the computer-readable storage medium stores a program. When the program is executed by a multi-core processor, the multi-core processor is caused to execute the method of the first aspect.

The above-mentioned at least one technical solution adopted in the embodiment of this application can achieve the following beneficial effects: In this embodiment, the current frame to be tested from each camera is detected in sequence, and then based on the detection result corresponding to each camera in the monitoring area Global tracking can realize global tracking of target objects in multi-channel surveillance videos based on less computing resources, and can realize target tracking based on multiple cameras based on less computing resources.

It should be understood that the above description is only an overview of the technical solution of the present invention, so that the technical means of the present invention can be understood more clearly, so that it can be implemented in accordance with the content of the description. In order to make the above and other objects, features and advantages of the present invention more obvious and understandable, the following examples illustrate the specific embodiments of the present invention.

Description of the drawings

By reading the detailed description of the exemplary embodiments below, those of ordinary skill in the art will understand the advantages and benefits described herein as well as other advantages and benefits. The drawings are only used for the purpose of illustrating exemplary embodiments, and are not considered as a limitation to the present invention. Moreover, the same reference numerals are used to denote the same components throughout the drawings. In the attached picture:

Fig. 1 is a schematic flowchart of a target tracking method according to an embodiment of the present invention;

2 is a schematic diagram of the ground of a monitoring area according to an embodiment of the present invention;

3 is a schematic diagram of viewfinder images of multiple cameras according to an embodiment of the present invention;

4 is a schematic diagram of current frames to be measured of multiple cameras according to an embodiment of the present invention;

5 is a schematic diagram of a set of detection frames corresponding to multiple cameras according to an embodiment of the present invention;

Fig. 6 is a schematic diagram of a global target trajectory according to an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of a target tracking device according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a target tracking device according to another embodiment of the present invention;

Fig. 9 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed ways

Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the drawings show exemplary embodiments of the present disclosure, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

In the present invention, it should be understood that terms such as "including" or "having" are intended to indicate the existence of the features, numbers, steps, actions, components, parts, or combinations thereof disclosed in this specification, and are not intended to exclude one Or the possibility of the existence of multiple other features, numbers, steps, behaviors, components, parts or combinations thereof.

In addition, it should be noted that the embodiments of the present invention and the features in the embodiments can be combined with each other if there is no conflict. Hereinafter, the present invention will be described in detail with reference to the drawings and in conjunction with the embodiments.

When tracking moving targets in the monitoring area, you can perform image detection on the current frame to be tested from each camera in turn, and then perform global tracking in the monitoring area based on the detection results corresponding to each camera, and then based on less Computing resources realize global tracking of target objects in multi-channel surveillance videos, reducing the demand for computing resources.

After introducing the basic principles of the present invention, various non-limiting embodiments of the present invention will be described in detail below.

FIG. 1 schematically shows a schematic flowchart of a target tracking method 100 according to an embodiment of the present invention,

As shown in FIG. 1, the method 100 may include:

Step S101: Obtain current frames to be tested of multiple cameras set in the monitoring area;

Specifically, the monitoring area refers to the sum of the viewing areas of multiple cameras. The multiple cameras include at least two cameras, and the viewing areas of the multiple cameras are adjacent to each other or at least partially overlapped, so that the target object to be tracked can be Move in the monitoring area and then appear in the viewing area of any one or more cameras. Among them, the current frames to be measured of the multiple cameras are respectively extracted from the surveillance videos of the multiple cameras, and the current frames to be measured of each camera have the same acquisition time. Optionally, the target to be tracked in this disclosure is preferably a pedestrian. Those skilled in the art can understand that the target to be tracked may also be other movable objects, such as animals, vehicles, etc., which is not specifically limited in the present disclosure.

For example, in complex surveillance scenarios, such as corridors, large shopping malls, computer rooms, etc., a large number of cameras are usually used to monitor various areas and obtain multiple surveillance videos. FIG. 2 shows a schematic monitoring scene in which a camera 201 and a camera 202 are set, and FIG. 3 shows a viewfinder screen of the above-mentioned camera 201 and the camera 202. Among them, the surveillance video of the camera 201 can be parsed as a sequence of image frames (A ₁ , A ₂ ,..., A _N ), and the surveillance video of the camera 202 can be parsed as a sequence of image frames (B ₁ , B ₂ ,..., B _N ), where the above analysis can be performed online or offline in real time. _{Based on this, the current frames A n} and B _{n of the} two cameras to be tested can be sequentially extracted from the above-mentioned multiple image frame sequences in time sequence to perform the target tracking shown in the present disclosure, where the value of the subscript n may be n=1,2,...,N.

In some possible embodiments, the method 100 may further include: determining a plurality of frame numbers to be measured, and iteratively acquiring the current frame to be measured of multiple cameras according to the plurality of frame numbers to be measured in time series, so as to iteratively perform target tracking ; Among them, the initial global target trajectory is obtained corresponding to the initial frame number to be tested among the multiple frame numbers to be tested; the global target trajectory after iterative update is obtained corresponding to the subsequent frame number to be tested in the multiple frame numbers to be tested. This can reduce the amount of calculations and improve the real-time performance of global tracking,

Specifically, multiple frame numbers to be tested can be determined according to a preset frame fetching strategy. _{For example, for a surveillance video of 24 frames per second, the current frames A n} and B _{n to} be tested can be obtained from the surveillance video of the camera 201 and the camera 202 every 5 frames, where the value of the subscript n can be n=1 , 6, 11,..., and so on. However, other interval frame numbers may also be adopted, or a frame-by-frame detection method may also be adopted, which is not specifically limited in the present disclosure. _{Based on this, the current frame to be tested A 1} and B ₁ corresponding to the initial frame number to be tested (n = 1) to the initial global target trajectory can be further based on the subsequent frame number to be tested (n = 6, 11,... Etc.) Iterative target tracking is performed on the corresponding current frames A _n and B _n to be tested, so as to obtain the global target trajectory after iterative update.

As shown in FIG. 1, the method 100 may further include:

Step S102: Perform target detection on the current frame to be tested of each of the multiple cameras in turn, to obtain a set of detection frames corresponding to each camera;

In a possible implementation manner, performing target detection on the current frame under test of each camera includes: inputting the current frame under test of each camera into a target detection model for target detection; wherein the target detection model is based on neural network training The resulting pedestrian detection model.

For example, as shown in FIG. 4, the current frames A _n and B _n _{of the camera 201 and the camera 202 to be tested are shown, and then the pre-processed current frame A n} is input into any deep learning-based pedestrian detection model. Perform detection with B _n and output a series of pedestrian detection frames for each camera. The purpose of obtaining the pedestrian detection frame is to obtain the position information and size information of all pedestrians in the _{current frames A n} and B _{n to be tested.} The pedestrian detection model may be, for example, a YOLO (Unified Real-Time Object Detection, You Only Look Once) model, etc., which is not specifically limited in the present disclosure. As shown in FIG. 5, _{multiple detection frame sets obtained by detecting multiple current frames A n} and B _{n to} be tested are shown, where the detection frame sets (a ₁ , a ₂ , a ₃ ) corresponding to the camera 201 are shown, The detection frame set corresponding to the camera 202 (b).

In a possible implementation manner, after the detection frame set corresponding to each camera is obtained, it further includes: according to the viewing position of each camera and the center of each detection frame in the detection frame set corresponding to each camera The points undergo projection transformation to determine the ground coordinates of each detection frame in the detection frame set corresponding to each camera. In this way, the targets identified in the viewing range of each camera can be combined into a unified coordinate system.

For example, the position of the center point of the bottom of each detection frame corresponding to each camera in Figure 5 can be obtained, and the position of the center point of the bottom of the frame of each detection frame can be converted to obtain the actual ground position of the target object in the monitoring scene. Fig. 6 shows the ground coordinates of each detection frame obtained through projection transformation. Specifically, it can be seen that the ground aisle under the viewing angle of each camera is an approximate trapezoidal area, so for the detection frame set corresponding to each camera, the center point of the bottom of each detection frame can be obtained through the trapezoid-rectangular transformation. The coordinates in the standard rectangular area. Secondly, the standard rectangular area is rotated according to the actual layout of the monitoring scene. The rotated coordinates of the center point of the bottom of each detection frame are calculated by the rotation matrix, and finally the rotation is rotated according to the actual layout of the monitoring scene. After the coordinates are translated and zoomed, the final coordinate position is obtained.

In a possible implementation manner, the viewing areas of the multiple cameras overlap at least partially, and the method further includes: dividing the working area of each camera in the ground coordinate system according to the viewing area of each camera; wherein, the work of each camera The areas do not overlap each other. If the ground coordinates of any detection frame corresponding to the first camera of the multiple cameras exceeds the corresponding working area, any detection frame is removed from the detection frame set of the first camera.

For example, as shown in FIG. 2, in order to prevent a surveillance blind spot in the surveillance scene, the viewing areas of the camera 201 and the camera 202 actually overlap. Based on this, in order to effectively avoid the problem of coordinate display conflicts, the working area of each camera can be divided. For example, the working area of the camera 201 is the X area, and the working area of the camera 202 is the Y area, so that the working area of each camera is Adjacent. Further, the ground coordinates of each detection frame corresponding to each camera need to be located in the working area of the camera, and removed if it is not in the working area of the camera. For example, because the ground coordinates of the detection frame a ₃ in the detection frame set (a ₁ , a ₂ , a ₃ ) corresponding to the camera 201 are outside the X area, the detection frame a is removed from the detection frame set corresponding to the camera 201 ₃ , get (a ₁ , a ₂ ) for subsequent operations.

In a possible implementation, the method further includes: cutting off non-critical areas in the working area of each camera. Specifically, whether it is a critical area can be determined based on the specific layout of the monitoring scene. For example, the ceiling area that cannot be passed by pedestrians can be directly cut off, which can reduce the amount of calculation for target tracking.

As shown in FIG. 1, the method 100 may further include:

Step S103: Perform target tracking according to the detection frame set corresponding to each camera, and update the global target trajectory according to the tracking result.

Specifically, as described above, for each camera, target detection can be performed according to the initial current frame to be measured A ₁ and B ₁ to determine the initial global target trajectory. Further, target detection may be performed according to the current frames A _n and B _n to be tested subsequently obtained, and target tracking may be performed iteratively according to the target detection result, so as to iteratively update the global target trajectory.

In a possible implementation, tracking according to the detection frame set corresponding to each camera includes: adopting a multi-target tracking algorithm, and performing multi-target tracking based on the detection frame set corresponding to each camera, and determining the part corresponding to each camera Tracking information: Among them, the parameters used in multi-target tracking are determined based on the historical frame to be measured of each camera. This enables multi-target tracking in the monitoring area.

Specifically, the multi-target tracking algorithm is a target tracking algorithm based on a single camera, such as the DeepSORT algorithm (Simple Online and Realtime Tracking with a Deep Association Metric), so you can get the information of each camera. Local tracking information. Among them, the parameters used in multi-target tracking are determined based on the historical frame to be measured of each camera. Specifically, the target frame to be tracked can be determined when any target appears in the working area of a certain camera for the first time, and based on the multi-target The detection algorithm and the identified target frame track the subsequent frame to be measured of the camera, and determine the local tracking information of the target in the working area of the camera.

In a possible implementation, the multi-target tracking algorithm is a deepsort algorithm. Of course, other target tracking algorithms can also be used, and those skilled in the art can understand that what the present disclosure intends to emphasize is not which target tracking algorithm is specifically used.

In a possible implementation, updating the global target trajectory according to the tracking result also includes: adding an identity to each detection frame according to the local tracking information corresponding to each camera; based on the identity, using the ground coordinate pair of each detection frame The global target trajectory is updated.

For example, as shown in Figure 6, the curve part shows the current global target trajectory, that is, the global target trajectory determined in the last iteration, and the points a ₁ , a ₂ and points b respectively represents the ground coordinates of the multiple detection frames shown in FIG. 5. Among them, if the local tracking information corresponding to the camera 201 indicates that the detection frame a ₂ matches the existing "target 2" feature, then the detection frame a _{2 is} labeled "target 2" and _{the ground coordinates of the point a 2} are added to "target 2" existing track (i.e., the "target 2" dashed curve in FIG. 6), if the camera 201 corresponding local trace information indicative of a _1-point detection frame matches the target does not exist, add a ₁ compared with a detection frame labeled " Goal 3" and create a new trajectory of "Goal 3".

In a possible implementation manner, updating the global target trajectory according to the tracking result further includes: determining the association relationship between the multiple cameras by the working areas of the multiple cameras; determining the new information in the corresponding working area according to the local tracking information of each camera Add detection frames and disappearance detection frames; associate new detection frames and disappearance detection frames in different work areas according to the association relationship between multiple cameras to obtain associated information; update the global target trajectory according to the associated information.

Specifically, the association relationship between the multiple cameras is, for example, that the area X and the area Y are adjacent to each other at a specified position, so that when the target moves, different work areas can be straddled from adjacent positions based on the above association relationship. Among them, the association information refers to the association between a new detection frame in a certain work area and a disappearance detection frame in another work area, that is, corresponding to the same identity. In other words, for two working areas with adjoining boundaries, the disappearance order of multiple tracking targets can be obtained at the adjoining boundary of one of the work areas, and the disappearance order of the pair appears in the adjoining boundary in the other work area. The multiple newly-added targets at the location are assigned corresponding identifiers and continue to be tracked,

For example, as shown in FIG. 6, the point b in the area Y represents the ground coordinates of the detection frame b shown in FIG. 5. If the local tracking information corresponding to the camera 201 indicates that there is no matching target at the detection frame point b, that is, there is a new target in the area Y; and the local tracking information corresponding to the camera 201 indicates that the continuously tracked "target 1" is currently detected If the frame disappears, that is, there is a disappearing target in area X, you can mark the detection frame b with "Target 1" and add the ground coordinates of point b to the existing trajectory of "Target 1" (ie, "Target 1" in Figure 6). "Dashed curve), to achieve cross-camera, cross-working area target tracking.

In this way, according to the multi-camera-based target tracking method of the present invention, by sequentially performing image detection on the current frame to be tested from each camera, and then performing global tracking in the monitoring area based on the detection result corresponding to each camera, it can be based on Fewer computing resources enable global tracking of target objects in multi-channel surveillance videos, reducing the demand for computing resources. For example, there is no need to separately provide GPU computing resources for tracking the target object in each local area for each camera, but fewer computing resources can be provided for global tracking of the target object in the monitoring area.

Based on the same technical concept, an embodiment of the present invention also provides a target tracking device for executing the target tracking method provided in any of the foregoing embodiments. Fig. 7 is a schematic structural diagram of a target tracking device provided by an embodiment of the present invention.

As shown in FIG. 7, the apparatus 700 includes:

The acquiring unit 701 is configured to acquire current frames to be measured of multiple cameras arranged in the monitoring area;

The detection unit 702 is configured to sequentially perform target detection on the current frame to be tested of each of the multiple cameras to obtain a set of detection frames corresponding to each camera;

The tracking unit 703 is configured to perform target tracking according to the detection frame set corresponding to each camera, and determine the global target trajectory according to the tracking result.

In some possible implementation manners, the apparatus 700 further includes: a frame selection unit, configured to determine multiple frame numbers to be tested, and iteratively obtain current frames to be tested from multiple cameras in a time sequence according to the multiple frame numbers to be tested, so as to iterate To perform target tracking; among them, the initial global target trajectory is obtained according to the initial test frame sequence number among the multiple test frame numbers; the global target trajectory after iterative update is obtained according to the subsequent test frame sequence numbers among the multiple test frame sequence numbers. .

In some possible implementation manners, the detection unit 702 is further configured to: input the current frame to be measured of each camera into a target detection model for target detection; wherein the target detection model is a pedestrian detection model obtained based on neural network training.

In some possible implementation manners, the detection unit 702 is further configured to: after obtaining the detection frame set corresponding to each camera, determine each detection frame in the detection frame set corresponding to each camera according to the viewing position of each camera. Projection transformation is performed on the center point of the bottom of the frame to determine the ground coordinates of each detection frame.

In some possible implementation manners, the viewing areas of multiple cameras overlap at least partially, and the device 700 is further configured to: divide the working area of each camera in the ground coordinate system according to the viewing area of each camera; wherein, each camera If the ground coordinates of any detection frame corresponding to the first camera of the multiple cameras are beyond the corresponding working area, any detection frame is removed from the detection frame set of the first camera.

In some possible implementation manners, the detection unit 702 is also used to cut off non-critical areas in the working area of each camera.

In some possible implementation manners, the tracking unit 703 is further configured to: adopt a multi-target tracking algorithm and perform multi-target tracking based on the detection frame set corresponding to each camera, and determine the local tracking information corresponding to each camera; The parameters used for target tracking are determined based on the historical frames to be tested for each camera.

In some possible implementation manners, the tracking unit 703 is further configured to: add an identity to each detection frame according to the local tracking information corresponding to each camera; to determine the iteratively updated information based on the identity and ground coordinates of each detection frame Global target trajectory.

In some possible implementation manners, the tracking unit 703 is further configured to: determine the association relationship between the multiple cameras according to the work areas of the multiple cameras; determine the new detection in the corresponding work area according to the local tracking information of each camera Frames and disappearance detection frames; associate new detection frames and disappearance detection frames in different work areas according to the association relationship between multiple cameras to obtain association information; determine the iteratively updated global target trajectory according to the association information.

In this way, according to the multi-camera-based target tracking device of the present invention, the current frame to be measured from each camera is detected in sequence, and then global tracking is performed in the monitoring area based on the detection result corresponding to each camera, which can be based on Fewer computing resources enable global tracking of target objects in multi-channel surveillance videos, reducing the demand for computing resources. For example, there is no need to separately provide GPU computing resources for tracking the target object in each local area for each camera, but fewer computing resources can be provided for global tracking of the target object in the monitoring area.

It should be noted that the device in the embodiment of the present application can implement each process of the foregoing method embodiment, and achieve the same effect and function, which will not be repeated here.

Based on the same technical concept, an embodiment of the present invention also provides a target tracking system, which specifically includes: a plurality of cameras arranged in a monitoring area, and a target tracking device respectively communicatively connected with the plurality of cameras; wherein the target tracking device is It is configured to execute the target tracking method provided in any of the above embodiments.

Based on the same technical concept, those skilled in the art can understand that various aspects of the present invention can be implemented as devices, methods, or computer-readable storage media. Therefore, various aspects of the present invention can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which can be collectively referred to herein as "Circuit", "Module" or "Equipment".

In some possible implementation manners, a target tracking device of the present invention may at least include one or more processors and at least one memory. Wherein, the memory stores a program, and when the program is executed by the processor, the processor is caused to perform the steps shown in FIG. 1: acquiring the current frame to be measured of multiple cameras arranged in the monitoring area ;Sequentially perform target detection on the current frame under test of each camera in the multiple cameras to obtain the detection frame set corresponding to each camera; perform target tracking according to the detection frame set corresponding to each camera, and determine the global target trajectory according to the tracking result.

The target tracking device 8 according to this embodiment of the present invention will be described below with reference to FIG. 8. The device 8 shown in FIG. 8 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.

As shown in FIG. 8, the apparatus 8 may be in the form of a general-purpose computing device, including but not limited to: at least one processor 10, at least one memory 20, and a bus 60 connecting different device components.

The bus 60 includes a data bus, an address bus, and a control bus.

The memory 20 may include a volatile memory, such as a random access memory (RAM) 21 and/or a cache memory 22, and may further include a read-only memory (ROM) 23.

The memory 20 may also include a program module 24. Such program module 24 includes, but is not limited to, an operating device, one or more application programs, other program modules, and program data. Each of these examples or a certain combination may include a network. The realization of the environment.

The apparatus 5 can also communicate with one or more external devices 2 (for example, a keyboard, a pointing device, a Bluetooth device, etc.), and can also communicate with one or more other devices. Such communication can be performed through an input/output (I/O) interface 40 and displayed on the display unit 30. In addition, the device 5 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 50. As shown in the figure, the network adapter 50 communicates with other modules in the device 5 through the bus 60. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the device 5, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID devices, tape drives And data backup storage devices, etc.

Fig. 9 shows a computer-readable storage medium for executing the method as described above.

In some possible implementation manners, various aspects of the present invention can also be implemented in the form of a computer-readable storage medium, which includes program code. When the program code is executed by a processor, the program code is used for The processor is caused to execute the method described above.

The above-described method includes multiple operations and steps shown and not shown in the above drawings, which will not be repeated here.

The computer-readable storage medium may adopt any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor device, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

As shown in FIG. 9, a computer-readable storage medium 90 according to an embodiment of the present invention is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program codes, and can be stored in a terminal device, such as a personal computer. Run on. However, the computer-readable storage medium of the present invention is not limited to this. In this document, the readable storage medium can be any tangible medium that contains or stores a program. The program can be used by or in combination with an instruction execution device, device, or device. .

The program code used to perform the operations of the present invention can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, Python, C++, etc., as well as conventional Procedural programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly executed on the user's device and partly executed on the remote computing device, or entirely executed on the remote computing device or server. In the case of remote computing devices, the remote computing device can be connected to the user computing device through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet services). Provider to connect via the Internet).

In addition, although the operations of the method of the present invention are described in a specific order in the drawings, this does not require or imply that these operations must be performed in the specific order, or that all the operations shown must be performed to achieve the desired result. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.

Although the spirit and principle of the present invention have been described with reference to several specific embodiments, it should be understood that the present invention is not limited to the disclosed specific embodiments, and the division of various aspects does not mean that the features in these aspects cannot be combined for performance. Benefit, this division is only for the convenience of presentation. The present invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

A target tracking method is characterized in that it comprises:

Obtain the current frame to be tested of multiple cameras set in the monitoring area;

Performing target detection on the current frame to be tested of each camera in the plurality of cameras in turn, to obtain a set of detection frames corresponding to each camera;

Target tracking is performed according to the detection frame set corresponding to each camera, and the global target trajectory is determined according to the tracking result.
The method according to claim 1, further comprising:

Determine a plurality of frame numbers to be tested, and iteratively acquire current frames to be tested of the plurality of cameras in time series according to the frame numbers to be tested, so as to iteratively perform the target tracking;

Wherein, the initial global target trajectory is correspondingly obtained according to the initial frame sequence number to be tested among the plurality of frame sequence numbers to be tested; the iteratively updated trajectory is obtained according to the subsequent frame sequence numbers under test among the plurality of frame sequence numbers to be tested Global target trajectory.
The method according to claim 2, characterized in that, performing target detection on the current frame to be measured of each camera comprises:

Input the current frame to be measured of each camera into a target detection model to perform the target detection;

Wherein, the target detection model is a pedestrian detection model obtained based on neural network training.
The method according to claim 2, characterized in that, after obtaining the detection frame set corresponding to each camera, the method further comprises:

Projective transformation is performed on the frame bottom center point of each detection frame in the detection frame set corresponding to each camera according to the viewing position of each camera, so as to determine the ground coordinates of each detection frame.
The method according to claim 4, wherein the viewing areas of the plurality of cameras overlap at least partially, and the method further comprises:

Dividing the working area of each camera in a ground coordinate system according to the viewing area of each camera;

Wherein, the working areas of each camera do not overlap each other. If the ground coordinate of any one of the detection frames corresponding to the first camera of the multiple cameras exceeds the corresponding working area, the detection frame of the first camera Remove any one of the detection frames from the set.
The method according to claim 5, wherein the method further comprises:

The non-critical area in the working area of each camera is cut off.
The method according to claim 2, wherein the tracking according to the detection frame set corresponding to each camera comprises:

Adopting a multi-target tracking algorithm, and performing multi-target tracking based on the detection frame set corresponding to each camera, and determining the local tracking information corresponding to each camera;

Wherein, the parameters used in the multi-target tracking are determined based on the historical frame to be measured of each camera.
The method according to claim 7, wherein the multi-target tracking algorithm is a deepsort algorithm.
The method according to claim 7, further comprising:

Adding an identity mark to each detection frame according to the local tracking information corresponding to each camera;

The iteratively updated global target trajectory is determined based on the identity and ground coordinates of each detection frame.
The method according to claim 7, further comprising:

Determining the association relationship between the multiple cameras according to the working areas of the multiple cameras;

Determine the newly added detection frame and the disappearance detection frame in the corresponding work area according to the local tracking information of each camera;

Associating newly-added detection frames and disappearing detection frames in different working areas according to the association relationship between the multiple cameras to obtain association information;

The iteratively updated global target trajectory is determined according to the associated information.
A target tracking device is characterized in that it comprises:

An acquiring unit for acquiring the current frame to be measured of multiple cameras arranged in the monitoring area;

A detection unit, configured to perform target detection on the current frame to be tested of each camera in the plurality of cameras in turn, to obtain a set of detection frames corresponding to each camera;

The tracking unit is configured to perform target tracking according to the detection frame set corresponding to each camera, and determine the global target trajectory according to the tracking result.
The device according to claim 11, further comprising:

The frame selection unit is configured to determine a plurality of frame numbers to be tested, and iteratively acquire the current frames to be tested of the plurality of cameras according to the number of frame numbers to be tested in time series, so as to iteratively perform the target tracking;

Wherein, the initial global target trajectory is correspondingly obtained according to the initial frame sequence number to be tested among the plurality of frame sequence numbers to be tested; the iteratively updated trajectory is obtained according to the subsequent frame sequence numbers under test among the plurality of frame sequence numbers to be tested Global target trajectory.
The device according to claim 12, wherein the detection unit is further configured to:

Input the current frame to be measured of each camera into a target detection model to perform the target detection;

Wherein, the target detection model is a pedestrian detection model obtained based on neural network training.
The device according to claim 12, wherein the detection unit is further configured to:

After the detection frame set corresponding to each camera is obtained, the center point of the frame bottom of each detection frame in the detection frame set corresponding to each camera is projected and transformed according to the viewing position of each camera, so as to determine the each camera. The ground coordinates of the detection frame.
The device according to claim 14, wherein the viewing areas of the plurality of cameras overlap at least partially, and the device is further configured to:

Dividing the working area of each camera in a ground coordinate system according to the viewing area of each camera;

Wherein, the working areas of each camera do not overlap each other. If the ground coordinate of any one of the detection frames corresponding to the first camera of the multiple cameras exceeds the corresponding working area, the detection frame of the first camera Remove any one of the detection frames from the set.
The device according to claim 15, wherein the detection unit is further configured to:

The non-critical area in the working area of each camera is cut off.
The device according to claim 12, wherein the tracking unit is further configured to:

Adopting a multi-target tracking algorithm, and performing multi-target tracking based on the detection frame set corresponding to each camera, and determining the local tracking information corresponding to each camera;

Wherein, the parameters used in the multi-target tracking are determined based on the historical frame to be measured of each camera.
The device according to claim 17, wherein the multi-target tracking algorithm is a deepsort algorithm.
The device according to claim 17, wherein the tracking unit is further configured to:

Adding an identity mark to each detection frame according to the local tracking information corresponding to each camera;

The iteratively updated global target trajectory is determined based on the identity and ground coordinates of each detection frame.
The device according to claim 17, wherein the tracking unit is further configured to:

Determining the association relationship between the multiple cameras according to the working areas of the multiple cameras;

Determine the newly added detection frame and the disappearance detection frame in the corresponding work area according to the local tracking information of each camera;

Associating newly-added detection frames and disappearing detection frames in different working areas according to the association relationship between the multiple cameras to obtain association information;

The iteratively updated global target trajectory is determined according to the associated information.
A target tracking system, characterized by comprising: a plurality of cameras arranged in a monitoring area, and a target tracking device respectively communicatively connected with the plurality of cameras;

Wherein, the target tracking device is configured to perform the method according to any one of claims 1-10.
A target tracking device is characterized in that it comprises:

One or more multi-core processors;

Memory, used to store one or more programs;

When the one or more programs are executed by the one or more multi-core processors, the one or more multi-core processors are caused to realize:

Obtain the current frame to be tested of multiple cameras set in the monitoring area;

Performing target detection on the current frame to be tested of each camera in the plurality of cameras in turn, to obtain a set of detection frames corresponding to each camera;

Target tracking is performed according to the detection frame set corresponding to each camera, and the global target trajectory is determined according to the tracking result.
A computer-readable storage medium, the computer-readable storage medium stores a program, and when the program is executed by a multi-core processor, the multi-core processor is caused to execute any one of claims 1-10 method.