WO2018058595A1

WO2018058595A1 - Target detection method and device, and computer system

Info

Publication number: WO2018058595A1
Application number: PCT/CN2016/101237
Authority: WO
Inventors: 刘晓青; 伍健荣; 白向晖
Original assignee: 富士通株式会社; 刘晓青; 伍健荣; 白向晖
Priority date: 2016-09-30
Filing date: 2016-09-30
Publication date: 2018-04-05
Also published as: CN109416728A

Abstract

A target detection method and device, and computer system. The method comprises: performing, with respect to a first key frame of video images and on the basis of a deep neural network, detection of a target in the first key frame to obtain all detection targets in the first key frame, and allocating an identifier to each of the detection targets (101); determining, with respect to each ordinary frame following the key frame and according to the target detection result of a previous frame, a candidate region corresponding to each detection target in a current frame, and re-positioning, according to the candidate region, said detection target to obtain a tracking target in the current frame (102); and performing, with respect to another key frame of the video images and on the basis of the deep neural network, detection of a target in the other key frame to obtain all detection targets in the other key frame, and performing, according to the re-positioning result of a previous frame, integration of the detection targets in the other key frame (103). The method of the present invention improves target detection precision and efficiency.

Description

Target detection method, device and computer system

Technical field

The present invention relates to the field of image processing, and in particular, to a target detection method, apparatus, and computer system.

Background technique

Currently, target detection methods for video focus on dynamic information while ignoring static targets. In the target detection for images, the deep convolutional neural network achieves higher precision, however, for video, this method is very time consuming and unqualified.

It should be noted that the above description of the technical background is only for the purpose of facilitating a clear and complete description of the technical solutions of the present invention, and is convenient for understanding by those skilled in the art. The above technical solutions are not considered to be well known to those skilled in the art simply because these aspects are set forth in the background section of the present invention.

Summary of the invention

In order to solve the problem pointed out by the background art, an embodiment of the present invention provides a target detection method, apparatus, and computer system, which use a deep neural network (DNN) as a detector to simultaneously detect dynamic and static targets, reducing time. .

According to a first aspect of the present invention, a target detection method is provided, wherein the method comprises:

For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all detection targets on the first key frame, and assigning an identifier to each detection target ;

For the normal frame after each key frame, according to the target detection result of the previous frame, the candidate region corresponding to each detection target on the current frame is determined, and each detection target is repositioned according to the candidate region to obtain the current frame. Tracking target

For other key frames of the video image, the target on the other key frames is detected based on the depth neural network, and all the detection targets on the other key frames are obtained, and according to the relocation result of the previous frame, the other The detection targets on the key frames are integrated.

According to a second aspect of the present invention, there is provided an object detecting apparatus, wherein the apparatus comprises:

a first detecting unit, for the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and Each inspection Measuring target allocation identifier;

a relocation unit, for a normal frame after each key frame, determining a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame, and repositioning each detection target according to the candidate region , get the tracking target on the current frame;

a second detecting unit, for other key frames of the video image, detecting the target on the other key frames based on the depth neural network, obtaining all the detection targets on the other key frames, and repositioning according to the previous frame As a result, the detection targets on the other key frames are integrated.

According to a third aspect of the present invention, there is provided a computer system, wherein the computer system comprises the apparatus of the aforementioned second aspect.

The beneficial effects of the embodiments of the present invention are that, by using the embodiments of the present invention, the accuracy of target detection can be improved and time consumption can be reduced.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and the drawings, in which <RTIgt; It should be understood that the embodiments of the invention are not limited in scope. The embodiments of the present invention include many variations, modifications, and equivalents within the scope of the appended claims.

Features described and/or illustrated with respect to one embodiment may be used in one or more other embodiments in the same or similar manner, in combination with, or in place of, features in other embodiments. .

It should be emphasized that the term "comprising" or "comprises" or "comprising" or "comprising" or "comprising" or "comprising" or "comprises"

DRAWINGS

The elements and features described in one of the figures or one embodiment of the embodiments of the invention may be combined with the elements and features illustrated in one or more other figures or embodiments. In the accompanying drawings, like reference numerals refer to the

The accompanying drawings are included to provide a further understanding of the embodiments of the invention Obviously, the drawings in the following description are only some of the embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without any inventive labor. In the drawing:

1 is a schematic diagram of a target detecting method of Embodiment 1;

2 is a schematic structural diagram of a target detecting method of Embodiment 1

3 is a schematic diagram of target detection of key frames of a video image;

4 is a schematic diagram of repositioning a detection target;

Figure 5 is a flow chart for relocating a detection target;

Figure 6 is an overall schematic view of one embodiment of repositioning a detection target;

Figure 7 is a schematic diagram of integration of detection targets on other key frames;

8 is an overall schematic diagram of one embodiment of integrating detection targets on other key frames;

Figure 9 is a schematic diagram of a target detecting device of Embodiment 2;

Figure 10 is a schematic diagram of a relocating unit of the object detecting device of Embodiment 2;

Figure 11 is a schematic diagram of a second detecting unit of the object detecting device of Embodiment 2;

Figure 12 is a schematic illustration of a computer system of the third embodiment.

detailed description

The foregoing and other features of the present invention will be apparent from the The specific embodiments of the present invention are disclosed in the specification and the drawings, which are illustrated in the embodiment of the invention The invention includes all modifications, variations and equivalents falling within the scope of the appended claims. Various embodiments of the present invention will be described below with reference to the accompanying drawings. These embodiments are merely exemplary and are not limiting of the invention.

The embodiments of the present invention will be described below with reference to the accompanying drawings.

Example 1

This embodiment provides a target detection method, and FIG. 1 is a schematic diagram of the method. As shown in FIG. 1, the method includes:

Step 101: For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and for each detection Target allocation identifier;

Step 102: For a normal frame after each key frame, according to the target detection result of the previous frame, determine a candidate region corresponding to each detection target in the current frame, and reposition each detection target according to the candidate region, and obtain Tracking target on the current frame;

Step 103: For other key frames of the video image, based on the depth neural network on the other key frames The target is detected, and all the detection targets on the other key frames are obtained, and the detection targets on the other key frames are integrated according to the relocation result of the previous frame.

FIG. 2 illustrates the overall architecture of the target detection method of the present embodiment. As shown in FIG. 2, for a key frame, the present embodiment performs target detection based on a deep neural network, and obtains all targets on each key frame, which is called a detection target. For normal frames, the target detection is no longer performed, but based on the detection result/target repositioning result of the previous frame, the target is repositioned, which is called the tracking target. In addition, for key frames other than the first key frame, not only all the targets on the key frame are detected, but also the target on the key frame is integrated based on the relocation result of the previous frame (normal frame), so as to avoid Lose the target or repeatedly identify the target or misidentify the target.

With the method of the embodiment, target detection is performed only on key frames, and high detection accuracy can be achieved with less time consumption.

In step 101, the target detection described above may be implemented by a DNN detector, that is, based on a deep neural network for target detection. For the working principle of the deep neural network, reference may be made to the prior art, which is not described in detail in this embodiment. By detecting the target on the key frame in step 101, all the targets on the key frame (referred to as detection targets) can be obtained. As shown in FIG. 3, each detection target is assigned an identifier (ID) for indicating the detection. aims. This embodiment does not limit the type of the identifier, and may be a specified number, or may indicate the attribute of the detection target and the like.

In this embodiment, as shown in FIG. 2, following the key frame followed by n normal frames, the value of n is not limited in this embodiment, and the value of n can take into account the accuracy of the target detection and the amount of calculation (with time consumption). Correlation, large amount of calculation, time consumption, small amount of calculation, less time consumption), if you want to improve the accuracy of target detection, you can set n to a smaller value, if you want to reduce the amount of calculation, that is, reduce the time consumption, n It can be set to a larger value, preferably less than 10.

In step 102, for a normal frame in the immediate vicinity of the key frame, the embodiment re-locates the detection target according to the target detection result of the key frame (the detection target is obtained). For other normal frames, the embodiment is based on the previous normal frame. Target retargeting results (getting tracked targets) retargeting tracking targets. Since the tracking target obtained by the repositioning result is also the detection target on the key frame, for convenience of explanation, the target repositioning result is referred to as the target detection result, and the tracking target is referred to as the detection target. That is, the target detection result mentioned in step 102 includes the target detection result obtained by performing target detection on the key frame, and also includes the target relocation result obtained by performing target relocation on the normal frame. Similarly, the detection target mentioned in step 102 includes the detection target obtained by performing target detection on the key frame, and also includes the tracking target obtained by performing target relocation on the normal frame.

In step 102, a candidate region of the detection target may be obtained by expanding a bounding box of the detection target.

Figure 4 illustrates a schematic diagram of target relocation for a certain detection target.

As shown in FIG. 4, for a target on the previous frame, as in the object in the ellipse in the image on the left side of FIG. 4, the embodiment may search for a candidate region of the current frame for the target, so as to be in the subsequent region. Reposition the target, as shown in the image on the right side of Figure 4, in the bolded rectangle, and assign a corresponding identifier to it.

In this embodiment, parameters can be used.

The bounding box of the original target (the target on the previous frame) is expanded to obtain the candidate region. If the size of the original target is B_w and B_h, the size of the candidate area after the expansion is:

In this embodiment,

The value can be set as needed, preferably greater than 1.5.

The method for extending the bounding box to obtain the subsequent area is only an example, and the embodiment is not limited thereto, and other implementable extension methods are also applicable.

In step 102, a candidate region for the detection target is obtained, and the detection target can be relocated according to the candidate region. Figure 5 provides a method of relocation, as shown in Figure 5, the method includes:

Step 501: traverse the candidate area by using a predetermined step size to obtain a plurality of candidate targets corresponding to the detection target;

Step 502: Calculate the similarity between each candidate target and the detection target.

Step 503: Determine a candidate target that matches the detection target according to the similarity degree, and use the matched candidate target as the tracking target of the detection target on the current frame.

In this embodiment, if only one candidate target has a similarity with the detection target greater than the first threshold, the candidate target is used as the tracking target, and the tracking number corresponding to the detection target is reset.

In this embodiment, if the similarity between the plurality of candidate targets and the detection target is greater than the first threshold, selecting the candidate target having the greatest similarity from the plurality of candidate targets as the tracking target, and resetting the detection. The number of tracking corresponding to the target.

In this embodiment, if the similarity between all the candidate targets and the detection target is not greater than the first threshold, it is determined whether the number of tracking corresponding to the detection target is 0; if the number of tracking is not 0, the number of tracking is decreased by 1 And retain the detection target on the current frame; if the number of tracking is 0, the detection target is not retained on the current frame.

In the present embodiment, by using the step size d to traverse the subsequent regions, a plurality of candidate targets corresponding to the detection target are obtained, as shown by all the rectangles on the right side of FIG. By setting the step size d, the number of subsequent targets can be reduced, thereby reducing the amount of calculation. In one embodiment, d may be set to be less than 4 pixels to ensure that the pixels of each candidate target are higher than 50, but this embodiment is not limited thereto.

In the present embodiment, whether the candidate target matches the detection target can be determined by calculating the similarity between the image features. In one embodiment, a gradient difference can be used, but the embodiment is not limited thereto, and other methods of calculating the similarity are also applicable. Since the method of calculating the similarity is the prior art, this embodiment will not be described in detail.

In this embodiment, the candidate target matching the detection target is found by setting the first threshold, however, if the similarity of all the candidate targets and the detection target is not greater than the first threshold, that is, there is no current frame. If the detection target is found, then in this embodiment, the detection target is not directly removed in the current frame, but the detection target of several frames is reserved by the keep_track parameter to avoid the judgment error.

In this embodiment, a parameter is set for each detection target on the key frame, which is called keep_track. If there is no candidate target matching the original detection target on the current frame, the embodiment does not immediately delete the identifier of the detection target. Instead, the value of the keep track is decremented by one and the detection target is retained without updating its position until the value of keep_track is equal to zero. If the detection target matches a candidate target on the previous frame before keep_track is 0, the keep_track will be reset. In the present embodiment, the value of keep_track can be set to [3 5], that is, it can be 3 or 4 or 5. However, the embodiment is not limited thereto, and it can also be set to other values.

In this embodiment, if the video image is high resolution, each frame image (key frame and normal frame) of the video image may be downsampled before implementation in the method of the embodiment, for example, using a factor δ. The target area or the candidate target area of each frame of the image is down-sampled. In this embodiment, the specific implementation manner of the downsampling is not limited, and the existing means may be used.

FIG. 6 is an overall flowchart of an embodiment of a relocation method according to the embodiment. Referring to FIG. 6, the method includes:

Step 601: Downsampling a target area of the previous frame image and performing feature extraction;

Step 602: Downsampling a candidate target area of the current frame image and performing feature extraction.

Step 603: Feature matching, and obtaining a matching score;

Step 604: Determine whether the matching score is greater than the first threshold, if the determination is yes, proceed to step 605; otherwise, perform step 606;

Step 605: Select an optimal candidate target;

Step 606: Decrease keep_track by one;

Step 607: Determine whether the keep_track is 0. If the determination is no, the position of the detection target is reserved in the current frame; otherwise, the position of the detection target is not retained in the current frame.

In step 601 and step 602, the feature extraction refers to extracting features of candidate objects on the detection target/current frame image on the previous frame image for feature matching, such as calculating similarity and the like. Moreover, the execution order of step 601 and step 602 is not limited in this embodiment, and may be performed simultaneously or separately.

In this embodiment, after processing the last normal frame before the next key frame by step 102, the embodiment enters the next key frame (that is, other key frames except the first key frame). Processing (step 103).

In step 103, for the next key frame, in addition to the same target detection process as in step 101, the detection target on the other frame is also based on the relocation result of the previous frame (the last normal frame described above). Integration is performed, that is, the target identification is assigned to the detection result of the current key frame according to the tracking target in the previous normal frame, thereby determining whether or not the target has left, whether there is a new target entering, or the like.

Figure 7 provides an integration method, as shown in Figure 7, which includes:

Step 701: Match each detection target on the other key frames with each candidate target on the previous frame;

Step 702: If the overlapping area of the detection target and the candidate target is greater than a second threshold, and the matching score of the detection target and the candidate target is greater than a first threshold, assigning a candidate for matching to the detection target The identity of the target;

Step 703: If the overlapping area of the detection target and the candidate target is not greater than a second threshold, or the matching score of the detection target and the candidate target is not greater than a first threshold, assigning a new target to the detection target Logo.

In this embodiment, if all the detection targets on the other key frames are assigned the identifier, and there is still no matching candidate target on the previous frame, the candidate target without the matching is not retained on the other key frames. .

FIG. 8 is an overall flowchart of an embodiment of the integration method of the embodiment. Referring to FIG. 8, the method includes:

Step 801: overlapping matrix IOU;

Step 802: Determine whether the IOU _ij is greater than the second threshold. If the determination is yes, perform step 803, otherwise assign a new identifier to the target i;

Step 803: Image feature matching;

Step 804: Determine whether the matching score is greater than the first threshold. If the determination is yes, assign the identifier of the target j to the target i. At this time, the IOU matrix is a matrix of (N-1)×(M-1), and step 805 is performed. Otherwise, a new identifier is assigned to the target i, and the IOU matrix is a matrix of (N-1)×M;

Step 805: Determine whether the number of IOU rows is greater than 0. If the determination is yes, go back to step 801, otherwise go to step 806;

Step 806: Determine whether the number of IOU columns is greater than 0. If the determination is yes, the candidate targets that are not matched are not retained on the current key frame, otherwise the process ends.

In this embodiment, as shown in FIG. 8, the overlap between the detection target from the new key frame and the tracking target from the previous normal frame is first calculated, and an N*M IOU (Intersection Over Union) is obtained. a matrix, for the detection target i of the key frame, if the tracking target j in the normal frame overlaps with the detection target i of the key frame, the overlapping region IOU _{ij is} greater than the second threshold th2, and the matching scores of the two targets If it is greater than the first threshold, the detection target i is given the same identifier as the tracking target j, otherwise, a new identifier is assigned to the detection target i.

In this embodiment, the process of image feature matching is the same as the foregoing, for example, by calculating a similarity (matching score), and details are not described herein again.

In the present embodiment, if the tracking target j matches the detection target i, the jth column will be deleted from the IOU matrix, that is, the target j is no longer matched. Therefore, for robustness, the maximum values of the rows of the IOU matrix are first sorted from large to small, and the row positions of the IOU matrix are arranged accordingly, and then the matching is performed from the first row of the matrix. If all the detection targets in the key frame are assigned an identifier, and there are still matching tracking targets in the normal frame, then these targets will be removed. Through such processing, it is possible to determine whether the target has left the visual range, whether there is a new target entering, etc., and to derive the number of targets within the visual range.

With the method of the embodiment, only the target detection of the key frame of the video image is performed, and the target relocation of the normal frame of the video image can improve the accuracy of the target detection and reduce the time consumption.

Example 2

The embodiment of the present invention provides a target detection device. The principle of the device is similar to that of the first embodiment. Therefore, the specific implementation may refer to the implementation of the method in the first embodiment. Description.

FIG. 9 is a schematic diagram of the object detecting apparatus of the present embodiment. As shown in FIG. 9, the apparatus 900 includes a first detecting unit 901, a relocating unit 902, and a second detecting unit 903.

The first detecting unit 901 is configured to detect, on the first key frame of the video image, the target on the first key frame based on the depth neural network, and obtain all the detection targets on the first key frame, and Each detection target is assigned an identification.

The relocation unit 902 is configured to determine a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame for the normal frame after each key frame, and re-inspect each detection target according to the candidate region. Positioning to get the tracking target on the current frame;

The second detecting unit 903 is configured to detect, on the other key frames of the video image, the targets on the other key frames based on the depth neural network, obtain all the detection targets on the other key frames, and obtain the relocation result according to the previous frame. , to integrate the detection targets on other key frames.

FIG. 10 is a schematic diagram of one embodiment of the relocating unit 902 of the present embodiment.

As shown in FIG. 10, in the present embodiment, the relocation unit 902 may include an extension unit 1001 that expands a bounding box of the detection target to obtain a candidate region of the detection target. For a specific extension method, reference may be made to the description of FIG.

As shown in FIG. 10, in the embodiment, the relocating unit 902 may further include: a traversing unit 1002, a calculating unit 1003, and a determining unit 1004, where the traversing unit 1002 may traverse the candidate area by using a predetermined step size to obtain a pair. A plurality of candidate targets of the target should be detected; the calculating unit 1003 can calculate a similarity between each candidate target and the detection target; the determining unit 1004 can determine a candidate target that matches the detection target according to the similarity, and the matching candidate The target is the tracking target of the detection target on the current frame.

In the present embodiment, the determining unit 1004 may use the candidate target as the tracking target when the similarity between the candidate target and the detection target is greater than the first threshold, and reset the tracking number corresponding to the detection target.

In the present embodiment, the determining unit 1004 may select, as the tracking target, the candidate target having the largest similarity from the plurality of candidate targets when the similarity between the plurality of candidate targets and the detection target is greater than the first threshold. And reset the number of traces corresponding to the detection target.

In this embodiment, the determining unit 1004 may further determine whether the tracking number is 0 when the similarity between all candidate targets and the detection target is not greater than the first threshold; if the tracking number is not 0, the tracking number is Minus 1, And the detection target is retained on the current frame; if the tracking number is 0, the detection target is not retained on the current frame.

In this embodiment, as shown in FIG. 9, the apparatus 900 may further include: a downsampling unit 904, which may perform the above-mentioned key frame of the video image and/or the target area/candidate target area on the normal frame. Sampling for image feature matching. As mentioned before, it will not be described here.

FIG. 11 is a schematic diagram of an embodiment of the second detecting unit 903 of the present embodiment.

As shown in FIG. 11 , in the embodiment, the second detecting unit 903 may include a matching unit 1101 and a processing unit 1102 . The matching unit 1101 is configured to match each detection target on the other key frames with each candidate target on the previous frame; the processing unit 1102 is configured to use the overlapping area of the detection target and the candidate target to be greater than the second a threshold value, and when the matching score of the detection target and the candidate target is greater than the first threshold, assigning, to the detection target, an identifier of the candidate target on the matching target; the overlapping area of the detection target and the candidate target is not greater than a second threshold, or When the matching score of the detection target and the candidate target is not greater than the first threshold, a new identifier is assigned to the detection target.

In this embodiment, the processing unit 1102 may also assign an identifier to all the detection targets on the other key frames, and if there are still no candidate targets on the previous frame, the other frames are not retained on the other key frames. There are no candidate targets on the match.

With the apparatus of the embodiment, target detection is performed only on key frames of the video image, and target relocation of the normal frame of the video image is performed, which can improve the accuracy of target detection and reduce time consumption.

Example 3

This embodiment also provides a computer system configured with the target detection device 900 as described above.

FIG. 12 is a schematic block diagram showing the system configuration of a computer system 1200 according to an embodiment of the present invention. As shown in FIG. 12, the computer system 1200 can include a central processor 1201 and a memory 1202; the memory 1202 is coupled to the central processor 1201. It should be noted that the figure is exemplary; other types of structures may be used in addition to or in place of the structure to implement telecommunications functions or other functions.

In one embodiment, the functionality of the target detection device 900 can be integrated into the central processor 1201. The central processing unit 1201 may be configured to implement the target detection method described in Embodiment 1.

For example, the central processing unit 1201 can be configured to perform control for detecting the target on the first key frame based on the depth neural network for the first key frame of the video image to obtain the first key frame. All detection targets and assign an identifier to each detection target; for normal frames after each key frame, according to a target detection result of one frame, determining a candidate region corresponding to each detection target in the current frame, repositioning each detection target according to the candidate region, and obtaining a tracking target on the current frame; for other key frames of the video image, based on The deep neural network detects the targets on the other key frames, obtains all the detection targets on the other key frames, and integrates the detection targets on the other key frames according to the relocation result of the previous frame.

In another embodiment, the target detecting device 900 can be configured separately from the central processing unit 1201. For example, the target detecting device 900 can be configured as a chip connected to the central processing unit 1201, and the target detecting device can be realized by the control of the central processing unit 1201. 900 features.

As shown in FIG. 12, the computer system 1200 can further include an input unit 1203, an audio processing unit 1204, a display 1205, and a power supply 1206. It should be noted that the computer system 1200 does not necessarily include all of the components shown in FIG. 12; in addition, the computer system 1200 may also include components not shown in FIG. 12, and reference may be made to the prior art.

As shown in FIG. 12, central processor 1201, also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device that receives input and controls various portions of computer system 1200. The operation of the part.

The memory 1202 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable device. The above video image, feature matching, and the like can be stored, and a program for executing the related information can be stored. And the central processing unit 1201 can execute the program stored by the memory 1202 to implement information storage or processing and the like. The functions of other components are similar to those of the existing ones and will not be described here. The various components of computer system 1200 may be implemented by special purpose hardware, firmware, software or a combination thereof without departing from the scope of the invention.

In this embodiment, the computer system may be a video surveillance system, but is not limited thereto.

With the computer system of the embodiment, target detection is performed only on key frames of the video image, and target relocation of the normal frame of the video image is performed, which can improve the accuracy of target detection and reduce time consumption.

Embodiments of the present invention also provide a computer readable program, wherein the program causes the target detecting device or computer system to perform the target detection described in Embodiment 1 when the program is executed in a target detecting device or a computer system method.

The embodiment of the present invention further provides a storage medium storing a computer readable program, wherein the computer readable program causes the target detecting device or the computer system to execute the target detecting method described in Embodiment 1.

The above apparatus and method of the present invention may be implemented by hardware or by hardware in combination with software. The present invention relates to a computer readable program that, when executed by a logic component, enables the logic component to implement the apparatus or components described above, or to cause the logic component to implement the various methods described above Or steps. The present invention also relates to a storage medium for storing the above program, such as a hard disk, a magnetic disk, an optical disk, a DVD, a flash memory, or the like.

The object detection method in the object detection apparatus described in connection with the embodiment of the present invention may be directly embodied as hardware, a software module executed by the processor, or a combination of both. For example, one or more of the functional blocks shown in Figures 9-11 and/or one or more combinations of functional blocks may correspond to individual software modules of a computer program flow, or to individual hardware modules. These software modules may correspond to the respective steps shown in FIGS. 1, 5, and 7, respectively. These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).

The software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. A storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor. The processor and the storage medium can be located in an ASIC. The software module can be stored in the memory of the mobile terminal or in a memory card that can be inserted into the mobile terminal. For example, if a device (such as a mobile terminal) uses a larger capacity MEGA-SIM card or a large-capacity flash memory device, the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.

One or more of the functional blocks described with respect to Figures 9-11 and/or one or more combinations of functional blocks may be implemented as a general purpose processor, digital signal processor (DSP) for performing the functions described herein. An application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or any suitable combination thereof. One or more of the functional blocks described with respect to Figures 9-11 and/or one or more combinations of functional blocks may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors One or more microprocessors in conjunction with DSP communication or any other such configuration.

The present invention has been described in connection with the specific embodiments thereof, and it should be understood by those skilled in the art that A person skilled in the art can make various modifications and changes to the invention in accordance with the principles of the invention, which are also within the scope of the invention.

Claims

A target detection method, wherein the method comprises:

For the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all detection targets on the first key frame, and assigning an identifier to each detection target ;

For the normal frame after each key frame, according to the target detection result of the previous frame, the candidate region corresponding to each detection target on the current frame is determined, and each detection target is repositioned according to the candidate region to obtain the current frame. Tracking target

For other key frames of the video image, the target on the other key frames is detected based on the depth neural network, and all the detection targets on the other key frames are obtained, and according to the relocation result of the previous frame, the other The detection targets on the key frames are integrated.
The method of claim 1, wherein determining a candidate region corresponding to each detection target on the current frame comprises:

The bounding box of the detection target is expanded to obtain a candidate region of the detection target.
The method of claim 1, wherein relocating each detection target according to the candidate region comprises:

Traversing the candidate regions using a predetermined step size to obtain a plurality of candidate targets corresponding to the detection target;

Calculating a similarity between each candidate target and the detection target;

A candidate target that matches the detection target is determined according to the similarity, and the matched candidate target is used as a tracking target of the detection target on the current frame.
The method of claim 3, wherein determining a candidate target that matches the detection target based on the similarity comprises:

If there is a similarity between the candidate target and the detection target greater than the first threshold, the candidate target is used as the tracking target, and the tracking number corresponding to the detection target is reset.
The method of claim 3, wherein determining a candidate target that matches the detection target based on the similarity comprises:

If a similarity between the plurality of candidate targets and the detection target is greater than a first threshold, selecting a candidate target having the greatest similarity from the plurality of candidate targets as the tracking target, and resetting the detection target corresponding The number of tracking.
The method of claim 3, wherein determining a candidate target that matches the detection target based on the similarity comprises:

If the degree of similarity between all candidate targets and the detection target is not greater than the first threshold, determining whether the number of tracking is 0;

If the number of tracking is not 0, the number of tracking is decremented by 1 and the detection target is retained on the current frame;

If the number of tracking is 0, the detection target is not retained on the current frame.
The method of claim 1 wherein the method further comprises:

Downsampling the target area or candidate target area on the key frame and/or the normal frame.
The method of claim 1 wherein integrating the detection targets on said other key frames comprises:

Matching each detection target on the other key frames with each candidate target on the previous frame;

Assigning an identifier of the candidate target on the matching target to the detection target if the overlapping area of the detection target and the candidate target is greater than a second threshold, and the matching score of the detection target and the candidate target is greater than a first threshold ;

And if the overlapping area of the detection target and the candidate target is not greater than a second threshold, or a matching score of the detection target and the candidate target is not greater than a first threshold, assigning a new identifier to the detection target.
The method of claim 8 wherein

If all of the detection targets on the other key frames are assigned an identification, and there are still no candidate candidates on the previous frame, then no candidate targets on the matching are retained on the other key frames.
A target detecting device, wherein the device comprises:

a first detecting unit, for the first key frame of the video image, detecting the target on the first key frame based on the depth neural network, obtaining all the detection targets on the first key frame, and Each detection target is assigned an identifier;

a relocation unit, for a normal frame after each key frame, determining a candidate region corresponding to each detection target on the current frame according to the target detection result of the previous frame, and repositioning each detection target according to the candidate region , get the tracking target on the current frame;

a second detecting unit, for other key frames of the video image, detecting the target on the other key frames based on the depth neural network, obtaining all the detection targets on the other key frames, and repositioning according to the previous frame As a result, the detection targets on the other key frames are integrated.
The apparatus of claim 10 wherein said relocating unit comprises:

And an extension unit that expands a bounding box of the detection target to obtain a candidate region of the detection target.
The apparatus of claim 10 wherein said relocating unit comprises:

Traversing a unit that traverses the candidate region using a predetermined step size to obtain a plurality of candidate targets corresponding to the detection target;

a calculation unit that calculates a similarity between each candidate target and the detection target;

a determining unit that determines a candidate target that matches the detection target according to the similarity, and uses the matched candidate target as a tracking target of the detection target on the current frame.
The device according to claim 12, wherein

When the similarity between the candidate target and the detection target is greater than the first threshold, the determining unit uses the candidate target as the tracking target, and resets the tracking number corresponding to the detection target.
The device according to claim 12, wherein

When the similarity between the plurality of candidate targets and the detection target is greater than the first threshold, the determining unit selects a candidate target having the greatest similarity from the plurality of candidate targets as the tracking target, and resets the The number of tracking corresponding to the detection target.
The device according to claim 12, wherein

The determining unit determines whether the tracking number is 0 when the similarity between all the candidate targets and the detection target is not greater than the first threshold; if the tracking number is not 0, the tracking number is decreased by 1 and is on the current frame. The detection target is retained; if the number of tracking is 0, the detection target is not retained on the current frame.
The device of claim 10, wherein the device further comprises:

A downsampling unit that downsamples the target area or candidate target area on the key frame and/or the normal frame.
The apparatus of claim 10, wherein the second detecting unit comprises:

a matching unit that matches each detection target on the other key frames with each candidate target on the previous frame;

a processing unit that allocates a match for the detection target when an overlapping area of the detection target and the candidate target is greater than a second threshold, and a matching score of the detection target and the candidate target is greater than a first threshold An identifier of the candidate target; when the overlapping area of the detection target and the candidate target is not greater than a second threshold, or the matching score of the detection target and the candidate target is not greater than a first threshold, the detection target is allocated New logo.
The device according to claim 17, wherein

The processing unit allocates an identifier to all the detection targets on the other key frames, and when there are still no candidate targets on the previous frame, the candidates on the other key frames are not retained. aims.
A computer system, wherein the computer system comprises the apparatus of any of claims 10-18.